Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

MPEG-2 Video Compression - MPEG-2 Video New Features with respect to MPEG-1

field picture frame prediction

Definition: Following MPEG-1, MPEG-2 video and audio compression standards were developed to meet the need in entertainment TV for transmission media such as satellite, CATV and digital storage media such as DVD.

The main effort was to compress CCIR-601 4:2:0 interlaced video with high quality since virtually all TV materials archived for last 50 years have been interlaced signals. Since the resolution is approximately 4 times that of CIF, the bit rate chosen for optimizing MPEG-2 was 4 Mbps. For the first time, functionality tools other than compression tools were specified in MPEG-2 – Scalability tools. MPEG-2 standards’ scope was enlarged to cover HD applications in July 1992. Now MPEG-2 handles a range of 4-15 Mbps as a generic coder. By generic, it is meant that the coding is not targeted for a specific application, but that the standard includes many algorithms/tools that can be used for a variety of applications under different operating conditions. To give interoperability between such different MPEG-2 applications, profiles and levels were defined.

MPEG-2 Video New Features with respect to MPEG-1

MPEG-2 too is a hybrid coder that is based on 8×8 block DCT and 16×16 motion compensation with half-pel resolution – a straight forward extension of MPEG-1. This is a key factor for the rapid acceptance of MPEG-2 in such a short period of time. However, there are many new features added in MPEG-2. Frame picture and Field picture were defined. Note that a video sequence in MPEG-2 is a collection of Frame pictures and Field pictures. Frame/Field adaptive motion compensation was introduced. Dual prime motion compensation was devised for P-pictures at no B-pictures. Frame/Field/Dual prime adaptive motion compensation was developed for Frame pictures, while Field/16×8MC/Dual prime adaptive motion compensation was devised for Field pictures. In terms of DCT, Frame and Field DCT was defined. And, nonlinear quantization table with increased accuracy for small values was proposed on top of linear quantization table. New coefficient scanning pattern for DCT was added in Intra block scanning – the alternative scan was mainly designed for interlaced data. New VLC table for DCT coefficient for Intra blocks was introduced. In addition, YUV4:2:2 and YUV4:4:4 formats were also considered in input for high quality studio applications.

In MPEG-2, IDCT mismatch control involves adding or subtracting 1 to coefficient77 if all coefficient sum is even. Coefficient VLCs support a larger range of quantized DCT coefficients in 24bits MPEG-2 FLC compared with MPEG-1 FLC due to more efficient representation. Coefficient VLC table escape format is not allowed if shorter VLC can be used. Chrominance samples horizontally co-sited as luminance samples. Slices always start and end at the same horizontal row of MBs. Concealment motion vectors for Intra MBs are possible. Motion vectors always coded in half-pel units, so full-pel flag must be zero in the Picture header. Display aspect ratio is specified in bitstream and pel aspect ratio is derived from it.

MPEG-2 Video Specific Semantics and Syntax

The 8×8 motion compensation is not used in MPEG-2. However, Frame and Field pictures are defined and fully exploited to further reduce prediction errors for the interlaced video. To optimally encode the interlaced video, MPEG-2 can encode a picture either as a Field picture or a Frame picture. In the Field picture mode, the two fields in the frame are encoded separately. If the first field in a picture is an I picture, the second field in the picture can be either I or P pictures. In other words, the second field can use the first field as a reference picture. However, if the first field in a picture is a P- or B-field picture, the second field has to be the same type of picture. In a Frame picture, two fields are interleaved to define a Frame picture and coded together as one picture.

In MPEG-2, Frame-based and Field-based predictions are defined due to which type of reference is used. Frame-based prediction is obtained based on reference frames, while Field-based prediction is obtained based on reference fields. Note that in a Frame picture, either Frame-based prediction (one 16×16) or Field-based prediction (two 16×8) is selected on MB basis. No matter which prediction is used, forward/backward/ interpolative modes are options to choose in each MB. In other words, optimal prediction mode is considered for each MB. For a Field picture, no Frame-based prediction is possible. For Field-based prediction in P Field pictures, the prediction is formed from the two most recent decoded fields. However, for Field-based prediction in B Field pictures, the prediction is formed from the two most recent decoded reference frames. Note that the size of 16×16 in the Field picture covers a size of 16×32 in the Frame picture. It is too big size to assume that behavior inside the block is homogeneous. Therefore, 16×8 size prediction was introduced in Field picture. Two MVs are used for each MB. The first MV is applied to the 16×8 block in the field 1 and the second MV is applied to the 16×8 block in field 2. The idea of Dual prime adaptive motion prediction is to send minimal differential MV information for adjacent field MV data – this looks very similar to direct mode in upcoming standards such as H.263 and MPEG-4. For adjacent two fields, the MV data look very similar in most cases. If two independent MVs were compressed/ sent, it would be just too expensive in this case. The Dual prime motion prediction investigates geometrical similarity between a MV in current field and the MV already sent in the previous field. Given the first MV, the second MV is geometrically derived from the first MV and only differential MV for fine adjustment is sent for the second one. To reduce the noise in the data, the same MV derivation is applied to opposite priority too in order to capture averaged pixel values of the prediction. In other words, the motion-compensated predictions from two reference fields are averaged. Two DCT modes, Frame/ Field DCT, were introduced in MPEG-2. In Frame DCT, luminance MB is broken down to 4 8×8 DCT blocks. On the other hand, in Field DCT, luminance MB is divided to 4 DCT blocks where the pixels from the same field are grouped together.

The interpolation method adopted for MPEG-2 is simple bilinear interpolation of adjacent integer-pels like MPEG-1. MPEG-2 supports the maximum MV range of -2048 to +2047.5 pixels for half-pel resolution. MB header is composed of MB type, QS, MV data and coded block pattern. Note that QS in MB is optional since it is in Slice layer. Two different weighting matrices for Intra and Nonlntra are, respectively, applied to DCT coefficient quantization. An MB contains 6 block data in the body, and a block consists of quantized DCT coefficients of an 8×8 size block. In Block level, DC value is coded differentially between the DC coefficient of the block and the prediction made from the DC coefficient of the block just coded from the same components. The bits assigned to it ranges from 1 to 11 bits. For blocks in Nonlntra MBs, DC coefficients are treated just like the rest of AC coefficients. There are 2 VLC tables – one for Nonlntra coefficients and the other for Intra AC coefficients. Intra AC coefficient table is only used for Intra AC coefficients when intra_vlc_format=1. The coding of AC coefficients are based on zig-zag scanning or alternative scanning to construct RLC. This scanning order is chosen on picture-by-picture basis. The mapping index of AC RLC Huffman table is (zero-RUN, LEVEL) duplet. The Nonlntra RLC Huffman table in MPEG-2 is exactly same as that of MPEG-1. The dct_type is in each MB of Frame pictures to indicate whether MB is Frame DCT coded or Field DCT coded. Note that no Field DCT is needed for Field pictures because each field is captured at one time point.

There are 6 levels of headers in MPEG-2 video bitstream syntax – Sequence, GOP, Picture, Slice, MB and Block. Sequence header contains basic parameters such as the size of video pictures, PAR or IAR, picture rate, bit rate, VBV buffer size, QS type (linear or non-linear) and some other global parameters. GOP header provides support for random access, fast searching due to I pictures. GOP header has SMPTE time code, closed-gop and brokenjink information. Picture header contains information about type of picture such as I, P and B. In Picture header, there are parameters like temporal reference and vbv_delay. And importantly full_pel_forward_vector and full_pel_backward_vector flags should be always set to 0 since half-pel resolution is mandatory in MPEG-2.


The purpose of scalability video is to achieve video of more than one resolution, quality or implementation complexity simultaneously. MPEG-2 supports four types of scalability modes: SNR, spatial, temporal and data partitioning. These allow a different set of tradeoffs in bandwidth, video resolution, or quality and overall implementation complexity. Data partitioning is bitstream scalability, where a single-coded video bitstream is artificially partitioned into two or more layers. In the SNR scalability quality scalability each layer is at different quality but at the same spatial resolution. Spatial scalability is spatial resolution scalability, where each layer has the same or different spatial resolution. Temporal scalability is frame rate scalability, where each layer has the same or different temporal resolution but is at the same spatial resolution.

In a basic MPEG-2 scalability mode, there can be two layers of video: base and enhancement layers. Data partitioning splits the block of 64 quantized transform coefficients into two partitions. Base partition has low frequency DCT coefficients that are usually of importance. By doing so, important bitstream and moderate bitstream are obtained. SNR scalability is to transmit coarse DCT coefficients with base layer. The difference between the non-quantized DCT coefficients and the base layer DCT coefficients is encoded with finer quantization step size in the enhancement layer. By doing so, two quality videos are obtained. Note that spatial resolution is not changed in enhancement layer in SNR scalability. Spatial scalability down-samples video/ transmits it in base layer bitstream. To generate a prediction in enhancement layer, spatial scalability upsamples based layer video signal and weights/ combines with the motion compensated prediction from the enhancement layer. Note that the weighting factor producing smallest prediction error is selected and identified on MB basis and the information is put into enhancement layer bitstream. By doing so, two spatial resolution videos are obtained. Temporal scalability skips/ transmits frames to generate lower frame rate in base layer bitstream. To generate prediction in enhancement layer, Temporal scalability uses base layer pictures as references for motion compensated prediction. By doing so, two temporal resolution (different frame rate) videos are obtained.

Two different scalability modes can be combined to generate 3 layers – base layer, enhancement layer 1 and enhancement layer 2. Note that enhancement layer 1 is the lower layer for enhancement layer 2. Such an extended scalability is defined as Hybrid scalability.

MPEG-4 Advanced Video Compression (MPEG-4 AVC)/H.264 - Network Adaptation Layer (NAL) and Video Coding Layer (VCL) [next] [back] MPEG-1 Video Compression - Key Compression Tools for MPEG Video, MPEG-1 Video Specific Semantics and Syntax

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

about 7 years ago

Parts of MPEG-2
MPEG-2 is a standard currently in ten parts. One has been withdrawn due to the fact that there was no demand for it in the industry.
Part 1 of MPEG-2 addresses the combining of one or more elementary streams of video and audio, as well as, other data into single or multiple streams, which are suitable for storage or transmission.
Part 2 of MPEG-2 builds on the powerful video compression capabilities of the MPEG-1 standard to offer a wide range of coding tools. These have been grouped in profiles to offer different functionalities.
Part 3 of MPEG-2 is a backwards-compatible multi-channel extension of the MPEG-1 Audio standard.
Part 4 and 5 of MPEG-2 correspond to part 4 and 5 of MPEG-1. They have been finally approved in March 1996.
Part 6 of MPEG-2 - Digital Storage Media Command and Control (DSM-CC) is the specification of a set of protocols, which provides the control functions and operations specific to managing MPEG-1 and MPEG-2 bitstreams. These protocols may be used to support applications in both stand-alone and heterogeneous network environments. In the DSM-CC model, a stream is sourced by a Server and delivered to a Client.
Part 7 of MPEG-2 is the specification of a multi-channel audio coding algorithm not constrained to be backwards-compatible with MPEG-1 Audio. The standard has been approved in April 1997.
Part 8 of MPEG-2 was originally planned to be coding of video when input samples are 10 bits. Work on this part was discontinued when it became apparent that there was insufficient interest from industry for such a standard.
Part 9 of MPEG-2 is the specification of the Real-time Interface (RTI) to Transport Stream decoders, which may be utilized for adaptation to all appropriate networks carrying Transport Streams.
Part 10 of MPEG-2 is the conformance testing part of DSM-CC, and it is still under development.