FLAC file sizes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Gordon
    Full Member
    • Nov 2010
    • 1425

    FLAC file sizes

    Prompted by a conversation on another thread I tracked down two versions of the same recording from 1946. Both came via Presto, one from RCA [the original owner of the recording] and one from NAXOS, both in 44.1/16 FLAC format.

    The strange thing is that these nominally identical files, the NAXOS running about 5 seconds longer in each movement [silences in run-in and -out], both in FLAC, have quite different files sizes. So when is FLAC not FLAC?! It is a lossless format and works by interating an algorithm so why do RCA/Presto chose to publish a file that is almost 3 times larger than NAXOS!! If FLAC is truly lossless then audio quality should not come into it.

    RCA's WAV for the first movement is 79.4MBytes [NAXOS 79.99] with the FLAC at 38MBytes, a ratio of about 2:1 which is a modest compression. NAXOS WAV is 14.8MBytes, a CR of over 5:1.

    Come to think of it when these on-line vendors offer variants who does the FLAC coding, the copyright owner or the publisher?
  • frankwm

    #2
    FLAC sizes are related to the modulation levels - so if a transfer of the same recording is, eg, some dB less - so will be the file-size - 2:1 ratio is roughly normal vis-a-vis WAV when files peak @ 0dB - your Nox-Off data is wrong - it would have to be a mono (as opposed to stereo) FLAC - and quite a bit 'quieter' to be 5:1 by comparison.

    Comment

    • Bryn
      Banned
      • Mar 2007
      • 24688

      #3
      Morton Feldman's music is famously quite (in general). Using level 8 FLAC compression, a ratio of just under 3:1* is achievable with the Ives Ensemble recording of Feldman's SQ2.

      * actually 2.956332553270529:1, whereas a 320kbps mp3 comes in with a ratio of 4.409980452403949:1.
      Last edited by Bryn; 05-03-16, 08:56. Reason: predictive text syndrome

      Comment

      • Dave2002
        Full Member
        • Dec 2010
        • 18034

        #4
        Re msg 1, a few questions.

        1. Do the files trully sound the same?
        2. Are the volume levels similar?
        3. Have you tried doing a spectral analysis of the output file?

        If the recordings are from 1946 it is quite likely that there is some significant noise along with the signal. Perhaps FLAC is tryiing to encode noise accurately?

        From https://en.wikipedia.org/wiki/FLAC#Compression_levels

        1. FLAC supports only fixed-point samples, not floating-point. It can handle any PCM bit resolution from 4 to 32 bits per sample, any sampling rate from 1 Hz to 655,350 Hz in 1 Hz increments,[10] and any number of channels from 1 to 8.[11]

        2. Channels can be grouped in some cases, for example stereo and 5.1 channel surround, to take advantage of interchannel correlations to increase compression.
        Point 1 indicates that the PCM resolution might have an effect. Since the encoding is using integer arithmetic, the number of bits used would determine the effective noise level in the encoding. Also the sampling rate could make a difference.
        Point 2 should probably be irrelevant, as a 1946 recording would probably be in mono.

        Re "why do RCA/Presto chose to publish a file that is almost 3 times larger than NAXOS!! If FLAC is truly lossless then audio quality should not come into it" presumably someone decided on the bit resolution and the sampling rate for the encoding based (hopefully) on some form of quality assessment. For CDs as source, the bit resolution would be 16 and the sampling rate 44.1 kHz ( or perhaps arguably higher than these - ?? ) in order to achieve true lossless encoding, but in this case the characteristics of the source are not fully known.

        I don't know enough about FLAC encoding and decoding, but is it possible that there are differences in the details of the algorithms used? Although both encoding and decoding are declared to be carried out using integer arithmetic, some calculations could be done using floating point without loss of detail as it is well known that it is possible to get accurate results up to the limits of the resolution used in the floating point representation. There could be differences in the implementation of the encoding algorithm used in the creation of the two different versions. [I'm perhaps clutching at straws here - but it does seem to me to be a possibility ...]

        There is also this
        libFLAC uses a compression level parameter that varies from 0 (fastest) to 8 (smallest). The compressed files are always perfect "lossless" representations of the original data. Although the compression process involves a tradeoff between speed and size, the decoding process is always quite fast and not very dependent on the level of compression.

        Comment

        • johnb
          Full Member
          • Mar 2007
          • 2903

          #5
          As Dave has said, when a file is encoded as FLAC there are optional compression settings of 0 to 8. "0" is the least compressed and "5" is the default. They all give lossless compression but the higher the compression level the more computing power is required to do the encoding and, to an extent, the decoding. Compression level 5 (the default) is what is normally used.

          Once a file has been encoded it seems that it is impossible to tell the compression level that was used (unless someone decided to put that information into one of the tags).

          Somewhere around 2:1 isn't unreasonable for a compression level around 5 however a compression ratio of 5:1 does seem very high indeed so perhaps Naxos were using a different source file.

          It might be interesting to decode each version then re-encode them using the same compression level (say 5).

          (Although FLAC is a command line program there is a convenient, easy to use front end called, surprise, surprise, "FLAC Frontend".)
          Last edited by johnb; 05-03-16, 12:31.

          Comment

          • Bryn
            Banned
            • Mar 2007
            • 24688

            #6
            Later today I intend to rip and FLAC (level 8) Frank Zappa's recorded performance of Cage's 4'33". Any predictions as to the compression ratio that might result?

            Comment

            • Beef Oven!
              Ex-member
              • Sep 2013
              • 18147

              #7
              Originally posted by Bryn View Post
              Later today I intend to rip and FLAC (level 8) Frank Zappa's recorded performance of Cage's 4'33". Any predictions as to the compression ratio that might result?
              Why Zappa, haven't you got a HIPP version?

              Comment

              • Bryn
                Banned
                • Mar 2007
                • 24688

                #8
                Originally posted by Beef Oven! View Post
                Why Zappa, haven't you got a HIPP version?
                As far as I know, there is no available recording of a David Tudor performance of the work. However, Zappa's performance is itself essentially HIPP. Decidedly unHIPP, as far as I am concerned, is the Amadinda recording.

                Comment

                • Gordon
                  Full Member
                  • Nov 2010
                  • 1425

                  #9
                  Thanks all!

                  FLAC sizes are related to the modulation levels - so if a transfer of the same recording is, eg, some dB less - so will be the file-size - 2:1 ratio is roughly normal vis-a-vis WAV when files peak @ 0dB - your Nox-Off data is wrong - it would have to be a mono (as opposed to stereo) FLAC - and quite a bit 'quieter' to be 5:1 by comparison.
                  The mono option in FLAC may have been used in one and not the other so I’d expect that that would halve the file size [approx.]. I’m still trying to understand how the amplitude of the programme material affects file size. The predictors available to the encoder are mostly defined and so the file size is probably more dependent on what happens in the residual coding. That is also true for MP3 etc as well but there the bit rate target defines the lossiness. If the variance of the residual is large because of the programme material itself being noisy for example [as Dave suggests] OR the data block is very active then getting bit for bit will be that bit harder and so the file larger. See more below. If the dynamic range is large within a coding block [typically less than a fraction of a second - I'm reminded of NICAM that is a DPCM too and it relies on a sample block size that has restricted DR] then the encoder MAY have trouble because of a large variance and try as it might by iterating its options it will struggle to meet the bit for bit criterion. BUT that small block size suggests that this might not be a greatly significant factor because each block predicts itself the whole point being to get the predictors to do a good job across the whole file?

                  What is the “Nox-Off” data I’m not familiar with the term. My files sizes are from Windows Explorer.

                  The objective in FLAC is not to hit a target file size but to keep bit for bit accuracy. That means that file size will vary as a result of parameter choice eg as Bryn points out below. Looking at some of the descriptive material on the web [thanks Dave – see below – I’ve also seen that stuff since last night!] it seems that the ENCODER will make a smaller file at setting 8 rather than 0 but it also seems that this is not dramatic, the speed of encoding being the factor than changes most and that is not relevant here.

                  Morton Feldman's music is famously quite (in general). Using level 8 FLAC compression, a ratio of just under 3:1* is achievable with the Ives Ensemble recording of Feldman's SQ2.

                  * actually 2.956332553270529:1, whereas a 320kbps mp3 comes in with a ratio of 4.409980452403949:1.
                  Thanks for that. Level 8 is the best for slow coding speed that will give the smallest file consistent with getting bit for bit accuracy. Level 5 is the default and level 0 does not compress at all. This is most relevant in real time codecs so for file transfer offline coding is likely and one would expect level 8 to be used. One assumes that the servers store FLACs and do not code on the fly for streaming or for transfer.

                  Re msg 1, a few questions.

                  1. Do the files truly sound the same?
                  As far as I can tell, yes they do but, as I said before, the RCA transfer is slightly noisier and that could have its effect [see above].

                  2. Are the volume levels similar?
                  Almost identical as seen in Audacity in both linear and dB plots. However it looks as if RCA has “ridden the faders” a bit because the quieter sections look a bit louder than the equivalent NAXOS but this is not apparent on listening.

                  3. Have you tried doing a spectral analysis of the output file?
                  Yes and they look very similar with RCA curtailing the rumble region more. Spectral levels also reflect the similar audio levels as shown in Audacity. Looking at the amplitude plots and the spectra there is less than 3dB between them across the whole file [one whole movement used for our purposes here].

                  If the recordings are from 1946 it is quite likely that there is some significant noise along with the signal. Perhaps FLAC is trying to encode noise accurately?
                  As above this is a good point but it’s hard to see what the extent of the effect on file size is given also that the additional noisiness of the RCA is not large. Nevertheless there could be a stronger linkage between the residual coding performance when there is more unpredictable content.

                  From https://en.wikipedia.org/wiki/FLAC#Compression_levels

                  1. FLAC supports only fixed-point samples, not floating-point. It can handle any PCM bit resolution from 4 to 32 bits per sample, any sampling rate from 1 Hz to 655,350 Hz in 1 Hz increments,[10] and any number of channels from 1 to 8.[11]

                  2. Channels can be grouped in some cases, for example stereo and 5.1 channel surround, to take advantage of interchannel correlations to increase compression. Point 1 indicates that the PCM resolution might have an effect. Since the encoding is using integer arithmetic, the number of bits used would determine the effective noise level in the encoding. Also the sampling rate could make a difference.

                  Point 2 should probably be irrelevant, as a 1946 recording would probably be in mono.

                  Re "why do RCA/Presto chose to publish a file that is almost 3 times larger than NAXOS!! If FLAC is truly lossless then audio quality should not come into it" presumably someone decided on the bit resolution and the sampling rate for the encoding based (hopefully) on some form of quality assessment. For CDs as source, the bit resolution would be 16 and the sampling rate 44.1 kHz ( or perhaps arguably higher than these - ?? ) in order to achieve true lossless encoding, but in this case the characteristics of the source are not fully known.
                  Yes to each for points 1 and 2: both FLACs are supposedly 44.1/16 presumably the restorers worked in a larger gamut and then reduced. What each has done is unknown but the results are very similar. The strongest point is that one maybe mono. It is possible, especially given that bit for bit is the criterion that has to be met and that a wide choice of sample rate and bit depth is available, that in fact one of these transfers was done at less than 44.1 [there’s not a lot above 7kHz in the source] and less than 16 bits [the decoder will know and give back a 44.1/16 WAV] but I think that unlikely. If it is the case then there is a bit of disingenuousness on the part of the vendors even if the file is bit for bit!!
                  Last edited by Gordon; 05-03-16, 13:22.

                  Comment

                  • frankwm

                    #10
                    Nox-Off = Naxos. (ie: "Knock-Offs")

                    Why don't you just experiment?
                    Take a 16/44 stereo WAV file - then (via Audacity) convert to mono and/or reduce the modulation (use the 'Amplify' app) and try different FLAC settings (0-8).

                    OK, I'll do it for ya, via Audacity (2.1.2):-

                    My 2011 Bliss:"Welcome the Queen" transfer (16/44 stereo WAV = 65.00MB)...Noisy music throughout - file peaks @ minus 1.12dB)

                    To stereo FLAC ('8'-'best': what I always use) = 39.3MB ... '4' = 39.4B ... ('0'-'fastest') = 40.9MB (got this result twice).

                    Stereo WAV to mono FLAC ('8') - 19.1MB (the stereo tracks combined as mono beforehand - some 'cancellation' takes place).

                    Stereo WAV 'de-amplified' by minus 3dB (ie: file now peaks @ minus 4.12dB) = 36.6MB - as opposed to the previous 39.3MB..

                    Comment

                    • Gordon
                      Full Member
                      • Nov 2010
                      • 1425

                      #11
                      Originally posted by frankwm View Post
                      Nox-Off = Naxos. (ie: "Knock-Offs")

                      Why don't you just experiment?
                      Take a 16/44 stereo WAV file - then (via Audacity) convert to mono and/or reduce the modulation (use the 'Amplify' app) and try different FLAC settings (0-8).

                      OK, I'll do it for ya, via Audacity (2.1.2):-

                      My 2011 Bliss:"Welcome the Queen" transfer (16/44 stereo WAV = 65.00MB)...Noisy music throughout - file peaks @ minus 1.12dB)

                      To stereo FLAC ('8'-'best': what I always use) = 39.3MB ... '4' = 39.4B ... ('0'-'fastest') = 40.9MB (got this result twice).

                      Stereo WAV to mono FLAC ('8') - 19.1MB (the stereo tracks combined as mono beforehand - some 'cancellation' takes place).

                      Stereo WAV 'de-amplified' by minus 3dB (ie: file now peaks @ minus 4.12dB) = 36.6MB - as opposed to the previous 39.3MB..
                      Thanks Frank, what a gent you are!!

                      That sort of wraps it up. Mono halves the file and reducing the level by 3 dB gets another 8% or so. With a "normal" CR of 2:1 or so the results combine to give about 4.3 on that material. One could imagine 5:1 being possible with the right material.

                      Still don't see how the level affects the file size but I'll work on it - the effect doesn't seem that strong if 3dB gets 8% although that could be material again.

                      Comment

                      • frankwm

                        #12
                        You could try some frequency-based experiments - not tried that, as had never been taken-aback by any substantial FLAC file-size differences - ie: say an EMI 78 transcription to LP (HF invariably limited to 8kHz, or so) compared to an early '50's Decca mono (some go to c.30kHz) - ignoring any playback equipment 'contribution'....but it seems unlikely Naxos would use 16/22, or some weird combination, to achieve 5:1...so maybe FR is some missing part of your query.

                        Comment

                        • Bryn
                          Banned
                          • Mar 2007
                          • 24688

                          #13
                          To emphasise the point re. dynamic levels, the level 8 FLAC of the Cage/Zappa comes out with a ratio of 5.949704544960752:1. Bear in mind it is not the 'Silent Piece' its sobriquet might suggest, though all the sounds are incidental to the performance.

                          Comment

                          • Gordon
                            Full Member
                            • Nov 2010
                            • 1425

                            #14
                            Originally posted by Bryn View Post
                            To emphasise the point re. dynamic levels, the level 8 FLAC of the Cage/Zappa comes out with a ratio of 5.949704544960752:1. Bear in mind it is not the 'Silent Piece' its sobriquet might suggest, though all the sounds are incidental to the performance.
                            Thanks Bryn, that puts some scale to the question of programme level.

                            Let's call it 6:1, so with a normal ratio of 2:1 the drop in activity leads to an effective CR of about 3:1, or 300%. Frank tells us that a 3dB reduction leads to an 8% reduction on a noisy piece of content. Your Cage piece would be low noise and low level so well placed for good compression.

                            Frank's suggestion of a contribution due to FR is worth thinking about not least because it leads to greater signal activity that challenges prediction. FLAC's two stages of Prediction + Residual coding needs very good prediction options to assure good compression. As in lossy MPEG, prediction itself does no compression it just changes the signal statistics to make compression possible. Maybe the dependence on level is due to prediction options not being scalable enough across the types of sample block statistics.

                            Comment

                            • Dave2002
                              Full Member
                              • Dec 2010
                              • 18034

                              #15
                              Originally posted by frankwm View Post

                              Stereo WAV 'de-amplified' by minus 3dB (ie: file now peaks @ minus 4.12dB) = 36.6MB - as opposed to the previous 39.3MB..
                              Reducing the signal gain (de-amplification as you have called it) by appropriate levels is equivalent to reducing the bit depth (integer arithmetic dividing by 2 is equivalent to losing the least significant bit in an integer representation), so effectively the noise levels wiil increase. As noted this does result in smaller compressed file sizes. Do the gain reduction enough times and eventually the noise will become much more audible.

                              This is on the assumption of typical musical (or non musical) content. It's not too difficult to imagine digitally encoded waveforms where there would be no loss into noise due to loss of the low order bits, but such signals would already be severely compromised. Example - take a music sample and replace the bottom 8 bits by zeroes. It will start to sound bad, but reducing the amplitude by a even a factor of 256 would have no more effect on the quality (already compromised) , but it would make it sound quieter. Reducing by 512 would make the quality worse, however.
                              Last edited by Dave2002; 05-03-16, 20:32.

                              Comment

                              Working...
                              X