Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from A-E

Audio Compression and Coding Techniques - Introduction – Audio Properties, Audio Digitization Codec, The Human Hearing/Vocal Systems and Audio Coding

signal sampling bandwidth quantization

Jauvane C. de Oliveira
National Laboratory for Scientific Computation, Petropolis, RJ, Brazil

Definition : Audio compression and coding techniques are used to compress audio signals and can be based on sampling or on signal processing of audio sequences.

Audio is the most important medium to be transmitted in a conference-like application. In order to be able to successfully transmit audio through a low bandwidth network, however, one needs to compress it, so that its required bandwidth is manageable.

Introduction – Audio Properties

Sound is a phenomenon that happens due to the vibration of material. Sound is transmitted through the air, or some other elastic medium, as pressure waves that are formed around the vibrating material. We can consider the example of strings of a guitar, which vibrate when stroked upon. The pressure waves follow a pattern named wave form and occur repeatedly at regular intervals of time. Such intervals are called a period . The amount of periods per second denotes what is known as the frequency of sound, which is measured in Hertz (Hz) or cycles per second (cps) and is denoted by f. Wavelength is the space the wave form travels in one period. It may also be understood as the distance between two crests of the wave. The waveform is denoted by ? . Yet with regard to the wave form, the intensity of the deviation from its mean value denotes the amplitude of the sound. Figure 1 shows an example of an audio signal, where we can observe both its amplitude and period. The velocity of sound is given by c=f? . At sea level and 208 C (688 F), c =343m/s.

A sound wave is an analog signal, as it assumes continuous values throughout the time. Using a mathematical technique called Fourier Analysis one can prove that any analog signal can be decomposed as a, possibly infinite, summation of single-frequency sinusoidal signals (See Figure 2). The range of frequencies which build up a given signal, i.e. the difference between the highest and lowest frequency components, is called signal bandwidth. For a proper transmission of an analog signal in a given medium that must have a bandwidth equal or greater than the signal bandwidth. If the medium bandwidth is lower than the signal bandwidth some of the low and /or high frequency components of the signal will be lost, which degrades its quality of the signal. Such loss of quality is said to be caused by the bandlimiting channel . So, in order to successfully transmit audio in a given medium we need to either select a medium whose bandwidth is at least equal to the audio signal bandwidth or reduce the signal bandwidth so that it fits in the bandwidth of the medium.

Audio Digitization Codec

In order to process audio in a computer, the analog signal needs to be converted into a digital representation. One common digitization technique used is the Pulse Code Modulation (PCM). Basically we’ll set a number of valid values in the amplitude axis and later we will measure the amplitude of the wave a given number of times per second. The measurement at a given rate is often referred to as sampling . The sampled values are later rounded up or down to the closest valid value in the amplitude axis. The rounding of samples is called quantization , and the distance from one value to the next refers to as a quantization interval . Each quantization value has a well-defined digital bitword to represent it. The analog signal is then represented digitally by the sequence of bitwords which are the result of the sampling + quantization. Figure 3 shows this procedure, whose digital representation of the signal is 10100 00000 00010 00010 10010 10101 10101 10011 00011 01000 01001 00111 00010 10011 10011 00001 00100 00101 00101 00110.

Harry Nyquist, a physicist who worked at AT&T and Bell Labs, developed in 1927 a study with regard to the optimum sampling rate for a successful digitization of an analog signal. The Nyquist Sampling Theorem states that the sampling frequency must be greater than twice the bandwidth of the input signal in order to allow a successful reconstruction of the original signal out of the sampled version. If the sampling is performed at a frequency lower than the Nyquist Frequency then the number of samples may be insufficient to reconstruct the original signal, leading to a distorted reconstructed signal. This phenomenon is called Aliasing .

One should notice that for each sample we need to round it up or down to the next quantization level, which leads to what is called quantization error . Such procedure actually distorts the original signal. Quantization noise is the analog signal which can be built out of the randomly generated quantization errors.

In order to reconstruct the analog signal using its digital representation we need to interpolate the values of the samples into a continuous time-varying signal. A bandlimiting filter is often employed to perform such procedure.

Figure 4 shows a typical audio encoder. Basically we have a bandlimiting filter followed by an Analog-to-Digital Converter (ADC). Such converter is composed of a circuit which samples the original signal as indicated by a sampling clock and holds the sampled value so that the next component, a quantizer, can receive it. The quantized, in its turn, receives the sampled value and outputs the equivalent bitword for it. The bandlimiting filter is employed to ensure that the ADC filter won’t receive any component whose Nyquist rate could be higher than the sampling clock of the encoder. That is, the bandlimiting filter cuts off frequencies which are higher than half of the sampling clock frequency.

The audio decoder is a simpler device that is composed of a Digital-to-Analog Converter (DAC), which receives the bitwords and generates a signal that maintains the sample value during one sampling interval until the next value gets decoded. Such “square” signal then goes through a low-pass filter, also known as reconstruction filter , which smoothens it out to what would be equivalent to a continuous-time interpolation of the sample values.

The Human Hearing/Vocal Systems and Audio Coding

The human hearing system is capable of detecting by sounds whose components are in the 20 Hz to 20 kHz range. The human voice, in the other hand, can be characterized in the 50 Hz to 10 kHz range. For that reason, when we need to digitize human voice, a 20 ksps (samples per second) sampling rate is sufficient according to the Nyquist Sampling Theorem. More generally, since we can’t hear beyond 20 kHz sinusoidal components, a generic sound such as music can be properly digitized using a 40 ksps sampling rate.

The above-mentioned characteristics of the human audio-oriented senses can be used to classify sound processes as follows:

a) Infra sonic: 0 to 20 Hz;

b) Audiosonic: 20 Hz to 20 kHz;

c) Ultrasonic: 20 kHz to 1 GHz; and

d) Hypersonic: 1 GHz to 10 THz.

The human hearing system is not linearly sensible to all frequencies in the audiosonic range. In fact the curve shown in Figure 5 shows the typical hearing sensibility to the various frequencies.

With regard to the quantization levels, using linear quantization intervals, it is usual to use 12 bits per sample for voice encoding and 16 bits per sample for music. For multi-channel music we’d use 16 bits for each channel. We can then find that we would use respectively 240 kbps, 640 kbps and 1280 kbps for digitally encoded voice, mono and stereo music. In practice, however, since we have much lower network bitrate available than those mentioned herein, we’ll most often use a lower sampling rate and number of quantization levels. For telephone-quality audio encoding, for instance, it is common to sample at 8 ksps, obviously cutting off sinusoidal components with frequencies over 4 KHz in order to comply with the Nyquist sampling theorem.

 

Audio Conferencing [next] [back] Audio and Video Information in Multimedia News Systems

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or