Data Sheet: Recorded Audio Formats

cdaudioTHE CD RECORDING STANDARD WAS FIRST DEFINED in the so-called “Red Book”, released in 1980 by Philips and Sony. The recording method is LPCM (Linear pulse-code modulation, often just referred to as PCM), which takes the original analogue wave signal, chops it up into slices, then registers the size of each slice as a series of numbers.  Clearly, the finer the slices and the more precisely each slice is measured, the more accurate the recording will be.

But too many fine slices measured too precisely could produce data overkill. This was a particular problem back in the ’80s when the data capacity of a CD was limited to around 800MB, reduced to 700MB allowing for error correction. The size of each slice is a function of how often you choose to sample the signal (the sample rate) and the Philips/Sony compromise set this as 44,100 samples per second, or 44.1kHz.

The numerical value of each slice is recorded as a binary number, a string of ones and zeros. The longer the string, the more precise the measurement. Here again, a compromise is needed. Philips/Sony settled on a string 16 bits long, technically called “16-bit quantization”.

Quantization is very like the choice of how many different levels of grey you might allow in the reproduction of a black and white photo. 16 levels of grey will give you a better sense of the light and dark areas than 8 levels. Screwing that up to 256 levels of grey provides a convincing imitation of an analogue monochrome photograph, but if you’re a radiologist inspecting the detail in an X-ray you’ll want at least 64,000 levels of grey

16/44 PCM, as the CD standard is sometimes called, was chosen because the 44.1kHz sample rate accurately covers the 20-20kHz range of human hearing and the 16-bit quantization records a range of amplitude that matches the normal volumes of domestic listening.

CDs and MP3s

Since 1980, video reproduction has escalated from 576-line TV, to HD and now to 4K. During the first half of this era audio quality, on the other hand, took a dive as low bit-rate MP3s proliferated.

MP3 audio compression as a concept is brilliant. By mathematically discarding elements of the soundscape that for a variety of reasons (including, but not limited to pitch) the human ear won’t appreciate, it can produce an audibly very similar track that may be five to ten times smaller than the LPCM original.

Towards the end of the last century, when storage was tight, this compression was very welcome. But MP3s often used very low bitrates (128kbps and below) that squeezed the life out of the music. Given sufficient headroom (192kbps and above) MP3s can shine, while still providing very useful compression.

Many of the first generation CD recordings were of poor audio quality as sound engineers struggled to get to grips with the new technology. Pundits were quick to point the finger at CD’s 16/44 Red Book standard, although the blame lay elsewhere in the chain of reproduction. The science tells us that, although a necessary compromise 35 years ago, 16/44 recording was finely tuned to human physiology.


Plummetting storage costs and soaring capacities have opened up the opportunity for fresh thinking about how we record and store audio. We can say goodbye to those mushy sub-160kbps MP3s and save our CD tracks as MP3s at 360kpbs. Better still we can use FLAC (the Free Lossless Audio Codec) that shrinks down LCPM to less than half its size without throwing away any of the original data).

Introduced at the beginning of this century by the software engineer Josh Coalson, FLAC is a fast, very widely supported codec. It’s free of proprietary patents, very well documented and with an open source reference implementation that manufacturers and application developers are free to use without legal entanglement (unlike, for example, MP3 or Apple’s AAC).

Also unlike MP3, FLAC has no “quality settings” that trade off audio fidelity against file size. However, the same audio file can be compressed to different FLAC “levels”, resulting in smaller or larger files. The levels run from 0 (least compressed) to 8 (most compressed) and the trade off here is simply the time taken to encode and the processing power needed to decode the audio track.

Hi-Res Audio

Hi-Res equipment and content vendors are urging us to go further. They want us to throw off the shackles of that 1980s Red Book standard and slice our analogue waveforms twice as fast (or faster) and quantify more precisely, using 24 bits or more to size each slice.

The lowest, most feasible and credible of these new Hi-Res standards is 24/96, using 24 bits instead of 16 to store the numerical value of each slice, and each second creating 96,000 slices rather than CD’s 44,100.

But does this really bring any advantage? Audio engineer and founder of the boutique manufacturing company Mojo Audio, Benjamin Zwickel, doesn’t think so. However, he seems to be talking in terms of living room amplification through loudspeakers. It’s feasible that different principles may apply to delicately tuned balanced armature transducers (like the Nuforce HEM4s) playing directly into your ear.


sacd-logoDSD, Direct Stream Digital, is the format used for Super Audio CD (SACD), the extended audio delivery system introduced con brio at the end of the last century but now largely abandoned. It takes samples 64 times faster than CD but the resolution of each sample is only one bit, that is to say 1/32000th of the resolution of CD.

How can you quantize the size of a sample in a single bit? you ask. Good question. You don’t.

It’s the stream of single bits that carries the information. A burst of ones will mean the value is ascending; a burst of zeros will mean its falling. If the stream alternates between zeros and ones the value is remaining flat.

One way of looking at the difference between CD’s PCM sampling and DSD is to think of the string of CD samples as schoolchildren queuing up to have their height measured. Matron uses a six foot rule, and carefully measures each child to the nearest 8th of an inch.

That was last year. This year the school has suddenly expanded, with 64 times the number of pupils. The new DSD matron gets through them all a lot quicker now by simply making a note whether each child is taller than, shorter than or the same height as the previous child in the queue.

This analogy, as a moment’s thought will tell you, fails completely for a string of random values like this. The technique, called delta-sigma processing, assumes there will be some sort of relationship between the consecutive values in the stream. The sort of relationship you would find with samples extracted in rapid succession from a continuous waveform. In this case, delta-sigma processing can produce a very accurate digital representation of the original analogue signal.

How accurate? More accurate than, say 24/96 FLAC? Again, Benjamin Zwickel of Mojo Audio (from whom I’ve borrowed the PCM versus DSD charts above) steps in with a note of scepticism.


For many music lovers the focus is moving away from owning a collection. Subscription services are the trend. My Yamaha RX-V679 offers me Napster, Spotify and Juke over the Internet, but I much prefer to tap into my own music stored locally on my NAS devices.

If you’re like me, here’s my personal view: stick to FLAC. At 16/44 FLAC will faithfully reproduce your CD collection. If you’re digitising vinyl there’s probably a good case for FLACcing at 24/96.

If you own, or aspire one day to own, a $75,000 speaker system backed by equivalent amplifiers and renderers to a total of, say, $200,000, you’ll probably think nothing of buying your entire music collection all over again as DSD or whatever the HiDef standard du jour might be (MQA emerged recently, and I’ll write more about it here if it turns out to have legs).

But if that’s you, you’re not like me.

Chris Bidmead

Comments are closed