Friday, August 3, 2007

MP3 Techniques

It may occur to you how does an MP3 file contain so much of data in so less space. To get such reduction of the amount of data, the MP3 format uses a few techniques and tricks. Let us just have an overview of those.

The minimal audition threshold
The minimal audition threshold of the ear is not linear. It is represented, according to the law of Fletcher and Munson, by a curve dug between 2 KHz and 5 KHz. It is not therefore necessary to code sounds situated under this threshold, because they will not be perceived.

The masking effect
This system is based on masking properties of the human ear. When you look at the sun, and if a crow passes by, you do not see it because of the too huge intensity of the light of the sun. In audio also, it is similar. In presence of strong sounds, you do not hear the weakest sounds.

Just to explain with an example, when an organist does not play, you hear the breath (sound) in the piping, and when he plays, you no longer hear it, because it is masked. It is therefore not necessary to code all the sounds. This is the first property used by the MP3 format to get some space. For this the MP3 encoder uses a psycho-acoustic model presenting the behav­ior of the human ear.

The bytes reservoir
Often, some passages of a musical piece cannot be coded to a given rate without altering the musical quality. The MP3 then uses a short reservoir of bytes that acts as a buffer by using capacity from pas­sages that can be coded to an inferi­or rate in the given flow.

The Joint Stereo coding
In the case of a stereophonic signal, the MP3 format can then use a few more tools, referred as Joint Stereo (JS) coding, to further shrink the compressed file size. In many mid-range Hi-fi sets, there is a unique subwoofer. However, you usually do not have the feeling that the sound comes from this boomer, but rather from satellite speakers. Indeed for very low and very high frequencies, the human ear is no longer able to locate the spacial origin of sounds with full accuracy.

The MP3 format can therefore (optionally) revert to such a trick by using what is called Intensity Stereo (IS). Some frequencies are then recorded as a monophonic signal fol­lowed by a few additional informa­tion in order to restore a minimum of spatialisation.

The second joint stereo tool is called Mid/Side (M/S) stereo. When the left and the right chan­nels are quite similar, then a middle (L+R) and a side (L-R) channels are encoded instead of left and right. This allows to reduce the final file size by using less bits for the side channel. During playback, the MP3 decoder will reconstruct the left and right channels.

The Huffman coding
The MP3 also uses the classic tech­nique of the Huffman algorithm. It acts at the end of the compression to code information. Thus, this is not a compression algorithm but a cod­ing method. This coding creates variable length codes on a whole number of bits. Higher probability symbols have shorter codes. Huffman codes have the property to have a unique prefix; they can there­fore be decoded correctly in spite of their variable length. The decoding step is very fast. This kind of coding allows saving (on an average) a bit less than 20% of space.

It is an ideal complement of the perceptual coding. During big poly­phonies, the perceptual coding is very efficient because many sounds are masked or lessened, but little information is identical, so the Huffmann algorithm is very seldom efficient. During "pure" sounds there are few masking effects, but Huffman is then very efficient because digitalized sound contains many repetitive bytes, that will then be replaced by shorter codes.

No comments: