Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

MPEG-1 Video Compression - Key Compression Tools for MPEG Video, MPEG-1 Video Specific Semantics and Syntax

data block prediction motion

Definition: MPEG-1 video and audio compression standards were mainly devised for CD-ROM applications at 1.5 Mbps.

The MPEG-1 video compression algorithm was optimized for bitrates of 1.1 Mbps since coded video at about 1.1 Mbps and coded stereo audio at 128kbps together match the CD-ROM data rates of approximately 1.4Mbps. SIF was used for optimal performance in MPEG-1. SIF corresponding to NTSC and PAL have the size of 352×240 at 29.97 fps and the size of 352×288 at 25 fps, respectively.

Key Compression Tools for MPEG Video

All MPEG standards are based on motion compensated transform coding, where main compression tools are following three: 1. Color conversion to YUV and down sampling in UV domain, 2. Spatial de-correlation, and 3. Temporal de-correlation. The video is encoded one macro block (MB) at a time. Each MB corresponds to a 16×16 luma component and the corresponding chroma components. First, UV domain down sampling is adopted to minimize domain resolution in color components. Second, a transform (DCT) is adopted to de-correlate spatial redundancy. Note that DCT was chosen in most of MPEG standards as a de-facto transform since it was proven to be an approximation of optimal K-L transform at 1 st order Markov model source. The size of 2-D transform is 8×8. Third, motion estimation/compensation is adopted to de-correlate temporal redundancy – we take a difference between current MB and the best match MB (a.k.a., prediction MB) in the reference frame to only send residual data (a.k.a., prediction error). Note that residual data is again spatially-de-correlated with 2D transform after motion predicted temporal de-correlation. The best match MB is described with motion vectors that are actually semantic data, not pixel data, which describes the displacement between current MB and its best match MB in the bitstream. The size of motion estimation/ compensation is 16×16 in Luma resolution. Figure 1 shows the frame structure and prediction modes supported in MPEG-1. For a better temporal de-correlation, bi-directional prediction is used. Bi-directional prediction is very effective when occlusion happens in neighboring shots. B frames, that undergo bi-directional prediction, can choose the better reference frame out of two (past and future) references. Which reference frame is better typically depends on which reference frame is closer to current frame and where occlusion happens in the scene, etc. There are 3 different prediction modes for MBs in B frames. Basically each MB area in a B frame adaptively can declare which of three modes is used for that particular MB: 1. forward, 2. backward, and 3. interpolative. In other words, the direction is chosen based on which reference frame the best match MB comes from. In interpolative mode case, the best match MB is pixel wise average of two MBs in both prediction directions.

Digitized NTSC resolution video has a bit rate of approximately 100 Mbps. When MPEG compression is applied to get 1.1 Mbps output bit rate, the compression ratio is about 100:1. In other words, almost 99? data are discarded in the compression stage to extract only about 1? of net video information. Accordingly, approximately 99? data are recreated back from about 1? of net information.

MPEG-1 Video Specific Semantics and Syntax

MPEG-1 is a hybrid coder that is based on 8×8 block DCT and 16×16 motion compensation with half-pel resolution. The 8×8 motion compensation is not used in MPEG-1. The purpose of half-pel motion compensation is to further reduce prediction errors. The exact interpolation method used for half-pel data in an encoder is to be applied in any decoder as well. And, the interpolation is performed on reconstructed reference frames even at the encoder to eliminate “drift”. Note that the interpolation method adopted for MPEG-1 is simple bilinear interpolation of adjacent integer-pels. MPEG-1 supports the maximum MV range of -512 to +511.5 pixels for half-pel resolution and -1024 to +1023 for integer-pel resolution. Half-pel resolution motion compensation is typically quite effective when the size of video is relatively small as in cell phone videos.

DCT is performed on residual data obtained through motion prediction process for Nonlntra MBs, while DCT is applied on original pixel data for Intra MBs. After 2-D DCT is applied to each block, the DCT coefficients are quantized to generate more zero-runs to constitute compact representation of RLC. If we quantize DCT coefficients with a bigger Quantization Scale (QS), more zero-runs are generated. Note that Quantization is the only one place where we artificially introduce errors in entire encoder and decoder systems. The purpose of artificially introducing errors in picture quality is to obtain a demanded compression ratio for specific applications. The level of degradation of quality depends on the value of QS we choose to use for each MB. Therefore, MB header includes QS information. MB header is composed of MB type, QS, MV data and coded block pattern. Note that QS in MB is optional since the mandatory one is in Slice layer. Two different weighting matrices for Intra and Nonlntra are, respectively, applied to DCT coefficient quantization. The idea of weighting matrix is to assign different amount of bit-resolution to different frequency components based on sensitivity of human visual systems.

An MB contains 6 block data in the body, and a block consists of quantized DCT coefficients of an 8×8 size block. In Block level, DC value is coded in difference between the DC coefficient of the block and the prediction made from the DC coefficient of the block just coded from the same components. The bits assigned to it ranges from 1 to 8 – represented in variable size bits. For blocks in Nonlntra MBs, DC coefficients are treated just like the rest of AC coefficients. The coding of AC coefficients is based on zig-zag scanning to construct RLC. The mapping index of AC RLC Huffman table is (zero-RUN, LEVEL) duplet. The half of AC RLC Huffman table in MPEG-1 is exactly same as that of H.261.

There are 6 levels of headers in MPEG-1 video bitstream syntax – Sequence, GOP, Picture, Slice, MB and Block. Sequence header contains basic parameters such as the size of video pictures, PAR, picture rate, bit rate, VBV buffer size and some other global parameters. GOP header provides support for random access, fast searching due to I pictures. Picture header contains information about type of picture such as I, P, B and D. Note that special D picture, which consists of blocks with only DC coefficient, is allowed in MPEG-1. D frames are rarely used, but were designed for fast search since DC component alone doesn’t need IDCT computation at all. In Picture header, there are two important parameters – temporal reference and vbv_delay. Temporal reference indicates the position of picture in display order within the GOP. Vbv_delay tells about how long to wait after a random access to avoid underflow or overflow of the bitstream buffer of the decoder.

MPEG-2 Video Compression - MPEG-2 Video New Features with respect to MPEG-1 [next] [back] Mow, William - Overview, Personal Life, Career Details, Chronology: William Mow, Social and Economic Impact

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or