FLAC file sizes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • frankwm

    #16
    Originally posted by Gordon View Post
    Frank's suggestion of a contribution due to FR is worth thinking about not least because it leads to greater signal activity that challenges prediction.
    Out of curiosity tried that.

    Took a 2min section from Ansermet/LSO 1950 Boutique Fantasque side 1 (5.53 to 7.53 - lots of percussion going to 30kHz ) - saved it as 32bit float WAV (transcribed originally as a 24/96 WAV: latterly do transcriptions as 32bit float/96).


    WAV = 19.9MB ... 24/96 FLAC = 11.4MB ... 24/96 + 20kHz/24dB/8ve low-pass filter = 11.3MB ... ditto @10kHz = 10.1MB ... ditto @5kHz = 9.17MB.

    With just a High-Pass filter: no change @10Hz to 200Hz = @11.4MB.

    The test file was re-set to original data each time for the above FR adjustments.

    Comment

    • Gordon
      Full Member
      • Nov 2010
      • 1424

      #17
      Also out curiosity I took a recent digital harpsichord recording [lots of jangling] recorded at low level [Goldberg variation 1] and coded to FLAC at level 8. Results: WAV [44.1/16] at full bandwidth 27.356MBytes became 17.178 FLAC, CR 1.59. The same bandlimited to 10KHz [Audacity sharpest cut off] gave 12.403MBytes CR 2.21 and 5kHz 9.497MBytes CR 2.88. Looks like it's not just level then.

      Comment

      • Dave2002
        Full Member
        • Dec 2010
        • 17871

        #18
        Originally posted by frankwm View Post
        Out of curiosity tried that.

        Took a 2min section from Ansermet/LSO 1950 Boutique Fantasque side 1 (5.53 to 7.53 - lots of percussion going to 30kHz ) - saved it as 32bit float WAV (transcribed originally as a 24/96 WAV: latterly do transcriptions as 32bit float/96).


        WAV = 19.9MB ... 24/96 FLAC = 11.4MB ... 24/96 + 20kHz/24dB/8ve low-pass filter = 11.3MB ... ditto @10kHz = 10.1MB ... ditto @5kHz = 9.17MB.

        With just a High-Pass filter: no change @10Hz to 200Hz = @11.4MB.

        The test file was re-set to original data each time for the above FR adjustments.
        Was that from an LP? If from a CD there should be virtually nothing above 22kHz. It's possible that some LPs might go higher, though it's also possible that any very high frequencies are simply various artefacts or distortion - e.g tracing distortion. Even if the HF sounds clean it may not be.

        I assume you did your own digitisation.
        Last edited by Dave2002; 05-03-16, 22:06.

        Comment

        • Gordon
          Full Member
          • Nov 2010
          • 1424

          #19
          Originally posted by Dave2002 View Post
          Reducing the signal gain (de-amplification as you have called it) by appropriate levels is equivalent to reducing the bit depth (integer arithmetic dividing by 2 is equivalent to losing the least significant bit in an integer representation), so effectively the noise levels wiil increase. As noted this does result in smaller compressed file sizes. Do the gain reduction enough times and eventually the noise will become much more audible.

          This is on the assumption of typical musical (or non musical) content. It's not too difficult to imagine digitally encoded waveforms where there would be no loss into noise due to loss of the low order bits, but such signals would,already be severely compromised. Example - take a music sample and replace the bottom 8 bits by zeroes. It will start to sound bad, but reducing the amplitude by a even a factor of 256 would have no more effect on the quality (already compromised) , but it would make it sound quieter. Reducing by 512 would make the quality worse, however.
          Careful!! In PCM [WAV] the Qnoise is dependent only on the quant step, assumed constant, not the signal level until the signal is very low when the Qnoise becomes non-noise like and more harmonic and then it is signal dependent. To avoid this the minimum signal must occupy at least 10 q steps preferably more.

          Reducing signal doesn't change the q step size it only affects the ratio with noise not the noise power. IF the noise is in the input signal then a reduced level also means reduced noise too since the S/N is not affected. The argument about the effective loss of the LSB applies to the S/N; when we calculate S/N then taking the signal level down affects the S/N because the "signal" is less than it can be. There are several definitions of S/N in PCM depending on what "signal" means. Keeping headroom of 6dB ddown from the clippers means that the signal is half what it could be for max S/N and so if playback gain is increased by 6dB to compensate then clearly the q noise comes up too but is not noticeable if the S/N is large enough.

          When FLAC codes an input with a low signal level it means that the S/N is low and so the "noise" is a greater part of what is coded but its power is not changed. That implies that a FLAC file should be larger because of the need to meet bit for bit accuracy with noise present. Experience reported above suggests that low amplitude signal levels with noisy input produces smaller files ie the low signal gains outweigh the effect of noise.

          Comment

          • frankwm

            #20
            Originally posted by Dave2002 View Post
            Was that from an LP? If from a CD there should be virtually nothing above 22kHz. It's possible that some LPs might go higher, though it's also possible that any very hgih frequencies are simply various artefacts or distortion - e.g tracing distortion. Even if the HF sounds clean it may not be.

            I assume you did your own digitisation.
            ..no need to assume - assuming the link works for you - then all the info is there: HF content is hardly 'artifacts/distortion': Ricci's Tchaikovsky Vln Conc, also from 1950, has Violin upper EHF harmonics accurately portrayed: do know what I'm doing!

            Comment

            • Gordon
              Full Member
              • Nov 2010
              • 1424

              #21
              Originally posted by frankwm View Post
              ..no need to assume - assuming the link works for you - then all the info is there: HF content is hardly 'artifacts/distortion': Ricci's Tchaikovsky Vln Conc, also from 1950, has Violin upper EHF harmonics accurately portrayed: do know what I'm doing!
              Not wishing to imply you don't know what you're doing but some further explanation seems in order.... I don't doubt that there is spectrum activity beyond 20kHz in your files but I am still open to being convinced that all ehf does indeed come directly from the instruments being captured. My instincts born of experience tell me that some at least must be spurious.

              In the 1950 era Decca used EMI BTR1 and then Ampex machines neither of which easily achieved wide bandwidth even at 30ips. Decca went to 15ips quite early. They didn't produce zero distortion either any more than vinyl did/does. Tape machines of that era did well to get better than 5% thd. The extinction frequency of a 1 mil [25micron] record head gap at 15ips back then was 15kHz, so recording anything higher relies on the hot spot of the head [recording does not take place in the gap but on the trailing edge] and that varies with record + bias current and also requires the tape formulation to have a high coercivity to hang on to the magnetism it's given and not self demagnetise; even replay head gaps were of that order too. Extinction means just that so frequencies where response is useful [it's a sincx shape where reponse is -13dB at half extinction] are well lower than this frequency and would need lots of EQ [noise!] to get back in addition to normal head losses.

              Also it is inherent in mag tape that the HF response depends on bias and that the machines are set up at low signal levels [typically -20dB on peak modulation] so produce best bandwidth at low levels. Best bandwidth is not achieved at the same bais as best thd. At max signal the [fixed] bias is not correct and so hf suffers badly and is compensated by whacking the tape to produce distortion that mimics hf. In some respects tape was a retrograde step in the early 50s. If the signal contains strong elements at around 10kHz and 15 kHz say then intermodulation at a few % will produce sum [25KHz] and difference [5kHz] sidebands as well as 2nd etc harmonics [20kHz and 30KHz] that add to the signal. These harmonics [not the intermods unless the air is non-linear] will be present in the original acoustic signal produced by the instrument and more than likely the microphones of the time will have a good go at capturing them but are you sure that the tape machine could? Even later machines like Studer A80s were not perfect as I know from experience of aligning them.
              Last edited by Gordon; 05-03-16, 22:41.

              Comment

              • frankwm

                #22
                Hmmm..well the FLAC example showed why I didn't notice a large variation (orchestral) in file-size between c.8kHz 78 material -and <30kHz LP- %-wise; it's really not much.

                The Tchaikovky, quoted, shows EHF Violin harmonics (in the Cadenza) as individually spaced 'bars', when viewed via Foobar (80 bar setting- which has a horizontal 25kHz graph (160bar setting loses that lower division on my screens) - so as Vinyl has near 40kHz bandwdth @ outer-groove (or even more) - then that's a viable FR (not to mention CD-4).

                These frequencies don't appear when EMI cut their mono LP's (usually peg-out before c.15kHz) - even though they would use BTR's: so it depends how high a BTR could theoretically record to...and Decca could cut to.. but the Ansermet percussion is very 'hot' - even @ 25kHz.

                Comment

                • Dave2002
                  Full Member
                  • Dec 2010
                  • 17871

                  #23
                  Gordon - re msg 19 - I'm still thinking hard about that one.

                  Other issues - this thread has proved to be very interesting. At first sight it looked "obvious" all FLAC should be the same - but now we seem to know that it isn't.

                  I'm not going to assume that frankwm isn't doing sensible things either, but like you I'm suspicious of claims of very high audio frequency components, and if there are any, they don't necessarily come from the instruments being recorded. I believe some pickup cartridges will go up to 40kHz (some were used at one stage for a subcarrier detection for one of the quad systems) , so if there are discs with very high frequencies they could be used to reproduce those. I don't know if many (any) commercial LPs contain frequencies outside the range of CDs - though just noticed that in msg 22 there is a comment re the Ansermet LP. I'm not going to argue with that - and I'm afraid I would not be able to hear the high transients anyway. Maybe there are others.

                  It is also often the case that other noises may be picked up during recording - some which may be audible to listeners, and others not. I have been at recordings (broadcast) where there were fluorescent lights quite close to microphones. I could hear some of the noise from those, but there may well have been more above my hearing range. Such noise would have presumably been picked up in spectral plots of recordings made which actually captured that noise.

                  Maybe some producers/engineers will check for this, and filter it out anyway. This would reduce or avoid any problems with digital recording at least, and also any problems arising from transcoding and digital compression.

                  We know that it is possible to make recording equipment and microphones which will go up to 100 kHz. Such kit is used in some observations of animals, such as bats and dolphins. However, I think it is normal to restrict the recording range for human consumption at least to under 25 kHz. FM is around 15 kHz, and CDs 22kHz, with theoretically DVDs going up to around 24kHz. Few people claim to have, or are demonstrated to possess, any hearing above 20 kHz.

                  I suspect that mostly equipment is used which doesn't unnecessarily record much material which is outside the range of interest, as such sounds might only be likely to cause problems later and further down the line.

                  Comment

                  • Richard Barrett
                    Guest
                    • Jan 2016
                    • 6259

                    #24
                    I do a lot of converting between uncompressed and FLAC material in the course of my work. The software I use is the X Audio Compression Toolkit (Xact for short) which is able to convert between pretty much any audio file formats. The FLAC option has a slider which can be set between "fastest" and "smallest". Just as a quick experiment, as a result of reading this interesting thread, I took the same .wav file of 26.8MB (2'32") and encoded it using opposite settings on this slider. The "fastest" setting was almost instantaneous and yielded a .flac file of 15.3MB while the "smallest" took a few seconds and gave 14.2MB. Not much of a difference, but the fact that there is one at all means that there is more going on in differences between FLAC sizes than differences (eg. filtering) in the original audio. What exactly it is that's going on to create those differences I haven't been able to find out.

                    Comment

                    • Gordon
                      Full Member
                      • Nov 2010
                      • 1424

                      #25
                      This for anaoraks!! Further to #16 and #21 above regarding high audio frequencies my mind went back to my early years learning about spectrum analysers and the pitfalls of interpreting displays [back then these were real instruments with knobs to make a shuttle pilot blanch]. Unless one is careful one can be misled if certain key maths issues are not dealt with.

                      One particular point is analysis bandwidth which is the degree of precision with which the analysis [whether an instrument or software FFT] achieves. For example a signal's spectrum can be determined by dividing its bandwidth into any number of "buckets" and measuring the power in each bucket. It should be obvious that the smaller each bucket is the more precise the spectrum detail. It follows then that a small bucket measures in a smaller bandwidth which then leads to smaller power in that bandwidth, the power of a signal being proportional to its bandwidth. IE the spectral density is less but more finely analysed. Conversely a large bucket leads to greater bandwidth more power density but less precision.

                      The other factor is the size of the data block being analysed - the resultant spectrum will be the average of the whole block so any particular event is lost to some extent - Audacity 2.0.3 limits itself to 109 seconds.

                      By not adhering to the maths rules spurious results can be reported. So looking at software like Audacity and others than compute spectra one needs to look at the analysis bandwidth which is expressed through the size of the FFT used. If the size is small say 512 points [always a power of 2] then precision is lost compared say to 16384 which gives 32x finer resolution but each bucket has 16384/512 [15dB] less power density. A small FFT can also lead to spurious results reporting spectral components that may not be present in the source. Audacity [v2.0.3] is limited to a 16384 FFT so cannot resolve better than that. Perhaps other packages like foobar can do better?

                      So, taking Frank's file from #16 [Boutique Fantastique/LSO] and extracting the 2 minutes clip he analysed for FLAC size we have this:



                      Note that it's very active with occasional peaks near to 0dB, often coincident with the tambourine.

                      So the spectra below illustrate plots taken of this clip using both 512 [top] and 16384 [bottom] FFTs and also reporting in linear [right] and log [left] frequency [horizontal] scales - the former show the HF region better. To emphasise the point about spurii I have filtered the source at 10kHz with the sharpest LPF that Audacity provides - 48dB per octave IOW anything at 20kHz is attenuated from its original level by 48dB and at 40kHz by 96dB. That means there is nothing much in the source after 10KHz yet Audacity reports energy at -72dB density going out to 35kHz and beyond if one uses a small FFT whereas the 16384 has no such EHF components, anything after 13.5kHz is below -90dB density. So where did those EHF components come from in the upper plots? Does that then also mean that even the 16384 FFT might also be misleading? On the whole the visibility of the lines in the 16384 plot suggests that one can believe the display, the resolution being fine enough. Where there is signal above 20kHz its density is 30dB [1000 times] down on the mid band. Is that significant?

                      Last edited by Gordon; 06-03-16, 18:14.

                      Comment

                      • frankwm

                        #26
                        My files invariably are recorded to 0dB (to obtain highest resolution: no subsequent 'normalizing' nonsense) so it needn't be a specific instrument that takes it to that point.

                        Configuring FOOBAR so that the screen is almost taken-up with the SPECTRUM function - you would easily see the 30Hz-25kHz bottom division spread - and a real-time analysis of that recording - extending to 25kHz (this doesn't happen with all that many Decca LP - certainly none of the EMI).

                        You'd have to upload a screen-grab - I don't know how (at least to this joint)...the Ricci/Tchaikovsky Cadenza would be interesting to interpret..

                        Audacity shows a peak @ 25kHz - probably a Decca armature resonance - before dying-off to 40kHz - so am unsure how helpful Audacity is in determining the FR - compared to FOOBAR....(could use any number of different types - the Dynavector Diamond has a <100kHz bandwidth - but currently have used one of my JVC X-1 (CD-4 / Shibata) for some of the more recent files there.

                        Lots of Nostalgic experiments for you (not just FLAC)...but they're not my studio recordings/mastering...!

                        Comment

                        • Dave2002
                          Full Member
                          • Dec 2010
                          • 17871

                          #27
                          I downloaded the whole of the Flac Ansermet Boutique - I'll have to come back and tell you exactly which file, but I think it was the first - 44.1 kHz sample rate one.

                          I didn't find that it was so stunning to listen to, and I'd say that some parts are a bit congested. Also there is some quite audible low frequency noise. Using a window size of 16284 shows a peak at around 12 Hz and another at about 19 Hz. There's a gap between 20 and 40 Hz. This is based on a section from 1 minute 22 seconds in to 1 minute 41 secs. That sample also shows quite a wide spectrum, going up to around 16 kHz - at about -87dB. Other, less vibrant parts of the track - say around 2 minutes 6 seconds to 2:16 don't show much above 15 kHz, and indeed, as Gordon suggests, modifying the window length gives different results. A fairly sensible window size of 2048 shows almost all of the spectrum below 10 kHz, and this is also the case for larger window sizes.

                          I tried modifying the window type, but really it doesn't make too much difference, and changing between linear and log displays simply changes the presentation.

                          There are parts where reducing the window size drastically - the lowest is 128, appears to show signal up to 20kHz, and even peaks beyond that, but I expect those are spurious.

                          From1:55.4 to 2:01.8 I think the "correct" spectrum goes up to 16 kHz (-90dB at the top) but if the window size is reduced to 128 then the spectrum appears to go up to over 20 kHz at -75dB with a small peak starting to rise above that. Also that particular section does sound a bit congested and distorted, so perhaps all that is happening is that we are seeing the spectrum of some noise starting to influence the overall picture. There are some parts of the data where it is possible to see rising small peaks above 20kHz, perhaps going up to 30 kHz, but these are quite likely to be artefacts - and/or possibly noise.

                          It's moderately fun and interesting to play around!

                          I could probably upload screen grabs later, but it'll take some time, and I do have other things to do. I'd be happy to pursue this later in the week.

                          Comment

                          Working...
                          X