Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from F-J

JPEG 2000 Image Coding Standard - Abstract, Introduction, The JPEG 2000 Coding Algorithm, The Image Transform, Quantization, Entropy coding

bit code stream subband

Hong Man
Stevens Institute of Technology, Hoboken, New Jersey, USA

Alen Docef
Virginia Commonwealth University, Richmond, Virginia, USA

Faouzi Kossentini
UB Video, Vancouver, British Columbia, Canada

Definition: JPEG 2000 is a wavelet-based image compression standard created by the Joint Photographic Experts Group committee with the intention of superseding their original DCT-based JPEG standard.

Abstract

Some of the major objectives of the JPEG 2000 still image coding standard were compression and memory efficiency, lossy to lossless coding, support for continuous-tone to bi-level images, error resilience, and random access to regions of interest. This presentation will provide readers with some insight on the basic coding structure of the baseline JPEG 2000 algorithm and various features and functionalities supported by the standard. It can serve as a guideline for users to evaluate the effectiveness of JPEG 2000 for various applications, and to develop JPEG 2000 compliant software.

Introduction

Since the release of the Joint Photographic Experts Group (JPEG) still image coding standard in 1994, it has been widely adopted in applications involving digital communication and storage of still images and graphics. Motivated by the evolution of image coding technology and by an increasing field of applications, the JPEG committee initiated a new project in 1997 to develop the next generation still image coding standard. The joint effort of the International Organization for Standardization (ISO) and the International Telecommunication Union (ITU-T), resulted in the JPEG 2000 International Standard, published in December 2000.

The original Call for Contributions for the JPEG 2000 standardization effort identified a set of coding features believed to be vital to many existing and emerging image applications. These were translated into goals for the new standard, as shown below.

  • The system should offer excellent compression performance at very low bit rates, typically 0.25 bits-per-pixel (bpp) or less.
  • The system should be able to compress both continuous-tone and bi-level images with similar system resources.
  • The system should provide lossy to lossless compression by means of a progressive coding process.
  • The system should be able to perform progressive coding in terms of both pixel accuracy and spatial resolution.
  • The system should produce code streams that are robust to channel errors.
  • The system should allow random access to and processing of certain parts of the image such as regions of interest.

Various techniques have been proposed to address these requirements. The standard eventually converged to a baseline coding system that achieves a good balance in supporting the desired features. This became the Part 1 (ISO/IEC 15444-1: 2004) of the standard. Additional parts have been constantly developed and standardized to enhance the functionality of the baseline coding system, or to facilitate development of JPEG 2000 compliant applications. To date, the available parts include:

Part 2: Extensions on wavelet decomposition, quantization, region of interest encoding, and new data formats (ISO/IEC 15444-2: 2004).

Part 3: Motion JPEG 2000 which applies JPEG 2000 image coding algorithm on individual frames of video sequences independently (ISO/IEC 15444-3: 2002).

Part 4: Test procedures for both encoding and decoding processes to validate conformance to JPEG 2000 Part 1 (ISO/IEC 15444-4: 2002).

Part 5: Reference implementations of JPEG 2000 Part 1, including JasPer (in C) and JJ2000 (in Java) (ISO/IEC 15444-5: 2003).

Part 6: Compound image format JPM for document imaging (ISO/IEC 15444-6: 2003).

Part 8: JPSEC addressing security aspects of the standard, such as encryption, source and content authentication, authorization, and IP protection (ISO/IEC FCD 15444-8).

Part 9: JPIP interactive protocol supporting image delivery over the Internet with feature such as bit stream reordering, random access and incremental decoding (ISO/IEC 15444-9: 2004).

Part 10: JP3D supporting volumetric coding of 3D data (ISO/IEC WD 15444-10).

Part 11: JPWL introducing mechanisms to protect the codestream from transmission errors, to describe error sensitivities within the codestream, and to indicate possible residual errors within the codestream (ISO/IEC FCD 15444-11).

Part 12: ISO Base media file format for timed sequence of media data (ISO/IEC 15444-12: 2005).

Part 13: An entry level JPEG 2000 encoder (ISO/IEC CD 15444-13).

The purpose of this presentation is to provide readers with some insight on the coding structure of JPEG 2000, its functional parameters, and their effect on coding performance. It is not our intention to discuss the detailed coding algorithm and its implementations in this presentation. For such information, the reader should refer to a tutorial on the coding engine by Taubman , a system level introduction by Christopoulos et al. and the standard text.

The JPEG 2000 Coding Algorithm

JPEG 2000 is essentially a transform coder, which consists of three stages: image transform, quantization and entropy coding. An image transform is used to achieve image data decorrelation. An efficient transformation yields a representation of the image data where the energy is concentrated in a small number of transform coefficients. Transform coefficients are then quantized to a finite number of quantization levels. It is during quantization that intentional loss of information occurs and most of the compression gain is achieved. Finally, the quantized coefficients are scanned and encoded into a bit stream using an entropy coder. An image decoder performs the inverse operations to obtain a reconstructed image. The coding engine of the JPEG 2000 standard is a coding algorithm derived from the Embedded Block Coding with Optimal Truncation (EBCOT) technique proposed by Taubman. Detailed descriptions of the codec are given in, among others. Currently there are three publicly available implementations of JPEG 2000: the JasPer implementation , the Kakadu implementation , and the JJ2000 implementation.

The Image Transform

The JPEG 2000 algorithm is based on the Discrete Wavelet Transform (DWT). It is therefore not compatible with the JPEG coding algorithm, which uses the two-dimensional (2-D) Discrete Cosine Transform (DCT). Studies have shown that a well designed DWT may have a moderate gain over DCT in the sense of data decorrelation. More importantly, a significant improvement in performance is achieved by applying the transform to the whole image or large blocks thereof instead of small image blocks (JPEG uses 8 × 8-pixel blocks). The drawback of this approach is its increased computational complexity. The selection of DWT at the beginning of the JPEG 2000 project essentially determined the coding structure of this new standard.

The 2-D DWT is usually implemented through iterative 2-D subband decomposition, as seen in Figure 1. First, the image is decomposed into four subbands, LL N , LH N , HL N , and HH N – The same 2-D subband decomposition is then applied to the lowest frequency subband (LL N ) to obtain subbands LL N-1 , LH N-1 , HL N-1 , and HH N-1 . The process is repeated N times, for an N -level decomposition. Commonly, the number of decomposition levels is around N = 5. This subband decomposition structure is called the Mallat (or dyadic, or pyramid) decomposition. This is the only decomposition structure supported by Part 1 of the standard. The inverse DWT transformation consists in N 2-D subband synthesis operations starting at the lowest resolution.

A one-level 2-D subband decomposition is usually achieved by applying a one-dimensional (1-D) two-band decomposition to all the rows (or columns) of a 2-D array and then to all the columns (or rows) of the resulting array. Therefore, the decomposition generates four subbands, as shown in Figure 1. Each 1-D subband decomposition consists of filtering the input sequence with a low-pass/high-pass filter bank and downsampling the two filtered sequences by a factor of two. JPEG 2000 specifies two sets of 1-D filter banks. The reversible 5/3 filter bank performs a reversible integer-to-integer transformation which can be used to achieve lossless coding. The irreversible 9/7 filter bank performs a real-to-real transformation which is reversible in infinite precision but irreversible in finite precision. The resulting DWT achieves better energy compaction and is used in lossy coding.

In addition to the DWT for spatial decorrelation, the standard also defines two sets of component transforms for multi-component images, such as color images. Again, an irreversible component transformation (ICT) is defined for lossy coding and a reversible component transformation (RCT) is defined for lossless coding.

Quantization

To quantize DWT coefficients, JPEG 2000 uses a uniform scalar quantizer with a dead zone around zero. This quantizer has been chosen mostly because of its simplicity. It can easily be implemented as a rounding operation. Each DWT coefficient a b (u,v) in subband b is quantized to the integer qb,(u,v) using the formula:

where b is the quantization step, which can vary from one subband to another and is encoded into the bit stream. The standard does not specify how the quantization step b is chosen, and different implementations can use various strategies. The decoder performs inverse quantization, described by the equation:

In lossless coding, coefficients of reversible transforms are not quantized, or they can be thought as quantized by a step size of one. In lossy compression, the quantization step sizes control the final size of the compressed data file, and thus the compression ratio. The larger the step sizes are, the smaller the compressed file size will be. Therefore, selecting the step sizes is a rate control issue and will be discussed in a later section.

Entropy coding

The major differences between various wavelet-based image coding algorithms are mostly in the lossless entropy coding stage. This is the step that affects their different coding performances and features to the greatest extent.

Bit-plane coding

To encode quantized DWT coefficients, JPEG 2000 uses a block based bit-plane coding method with multiple passes per bit-plane. Each subband is partitioned into a set of non-overlapping rectangular code blocks with a fixed size. Each block is then encoded independently of other blocks. Within each block, the coding starts from the most significant non-zero bit-plane and proceeds to the least significant bit-plane. In each bit-plane, three coding passes are performed, namely the significance propagation pass, the magnitude refinement pass and the cleanup pass. Each coding pass scans the block according to a scanning pattern, and generates coding symbols using a set of pre-defined rules. The scanning pattern, shown in Figure 2, has been chosen to facilitate efficient software and hardware implementations.

During the significance propagation pass, the encoder identifies the block coefficients which have been zero (or insignificant) at higher bit-planes but have non-zero (or significant) neighbors. It is assumed that these coefficients are more likely to become significant in the current bit-plane. For each tested coefficient, the test result is encoded into the bit stream (1 if significant, 0 if insignificant). If a new significant coefficient is detected, its sign bit is encoded immediately.

The magnitude refinement pass processes coefficients which have been found significant in previous bit-planes. For each such coefficient, the magnitude bit in the current plane is encoded into the bit stream.

The cleanup pass tests all the insignificant coefficients which have not already been tested in the significance propagation pass in the current bit-plane. It is expected that these coefficients are less likely to become significant because they do not have any significant neighbor. Therefore, to reduce the number of symbols generated in this pass, test results are run-length encoded when four consecutive coefficients in a scan pattern column are all insignificant. Once a significant coefficient is identified, its sign bit is also encoded immediately.

Each coefficient in a code block will be coded once and only once in each bit-plane. The symbols generated by these three passes are usually passed through a binary adaptive arithmetic coder with context modeling. Specific context models are defined for the coding of each pass and for the coding of sign bits. Each context determines which adaptive probability model will be used in the arithmetic coding of the current symbol. The contexts are selected based on the significance of the eight or four connected neighboring coefficients. To simplify context calculation and achieve reliable probability estimation, the number of contexts is kept small. More specifically, the significance propagation pass uses nine contexts, the magnitude refinement pass uses three contexts, the cleanup pass uses the nine significance propagation contexts plus one extra context for run-length coding, and the sign bit coding uses five contexts. JPEG 2000 uses a binary adaptive arithmetic coder called the MQ coder, designed to reduce computational complexity. It is closely related to the well-known QM and an offspring of the Q coder. Arithmetic coding is always initialized at the beginning of a code block and can be optionally reinitialized at the beginning of a coding pass.

Rate control

One rough method for rate control has been mentioned in the context of quantization. However, the multipass bit-plane coding approach described above allows for rate control with better precision and reliability. A distinguishing feature of JPEG 2000 is its post-compression rate-distortion (PCRD) optimization, or optimal bit stream truncation. Its purpose is to determine the optimal bit allocation for each code block under the constraint of a target bit rate.

During quantization, a very fine quantization step size b is used for all the coefficients inside a subband. Encoding of each code block results in a bit stream which represents all the quantized coefficients to a very fine scale. For most useful compression ratios, it is not possible to send all these streams in their entirety. Therefore, these streams are subject to truncation in order to meet the overall target bit rate. PCRD attempts to choose the truncation points in a way that minimizes the coding distortion for a target bit rate. The possible truncation points in a bit stream are at the end of each coding pass. If a bit stream of a certain block is truncated at a bit-plane that is N bit-planes higher than the least significant bit plane, the effective quantization step of this block is 2 N b .

During the coding process, whenever a coding pass is completed, the bit rate consumption and the reduction in distortion due to this coding pass are calculated and recorded. When the coding of a whole block is completed, the rates and distortions of all the passes are recorded in a rate-distortion (R-D) table. Similar tables are generated for all code blocks. PCRD optimization uses a Lagrangian optimization method to determine which rate-distortion pair is to be used for each code block. Details for the optimization routine are given in , pp. 339–348.

An embedded code stream is a code stream that can be decoded as a whole, or it can be truncated and decoded at various bit rates lower than the original bit rate. Therefore, the lower bit rate streams can be seen as embedded in a higher bit rate stream. This scalability feature can be helpful in applications involving multiple bandwidth channels or multiple decoding platforms. In this context, although the complete code stream is optimized in the R-D sense, an arbitrarily truncated stream may not be optimal at its reduced bit rate. To address this problem, EBCOT supports the concept of quality layers. During PCRD optimization, instead of performing one R-D optimization for a single target rate, the coder can perform a series of R-D optimizations from lower rates to higher rates. Each new optimization is built upon the previous optimal bit stream truncations, and each optimization generates a quality layer that will be appended to previous layers to form the final code stream. Therefore, any truncation of the code stream at the end of a quality layer will always be optimal in the R-D sense. The number of bits corresponding to each data block included in a layer is encoded in the code stream as header information.

Regions of interest

Many applications may require that some areas within an image (or regions of interest , ROI) be encoded with higher accuracy than the rest of the image. Therefore, at a certain total bit rate, the encoder should allocate more bits for ROI pixels and fewer bits for non-ROI (or background) pixels. This feature usually involves an ROI mask and a shift scale s . The ROI mask is a binary map defining an arbitrary-shape region of interest. A set of smaller ROI masks are calculated for each subband according to the original mask. The shift scale s specifies how much the ROIs will be emphasized. It is used to shift the quantization indices of all ROI subband coefficients to higher bit-planes, and effectively amplify their magnitude by a factor of 2 s . This shift will be corrected at the decoder. In Part 1 of the standard, the ROI implementation is based on the Maxshift method, in which the shift scale is always larger than the largest number of magnitude bit-planes of all the background regions. As a result, after the ROI shift, the least significant bit-plane of the ROIs will still be higher than the most significant bit-plane of the background regions. This shift scale will have to be coded into the code stream and sent to the decoder. Entropy coding of the shifted subband coefficients is performed as usual. Since the decoder is using bit-plane decoding, it will always complete the decoding of all ROI coefficients before it can reach the background coefficients. Therefore the decoder can determine the ROI mask as the set of significant coefficients after s bit planes have been decoded. Thus the ROI mask does not have to be transmitted.

The JPEG 2000 data structure

The JPEG 2000 defines a set of data structures that standardizes the access and reference to all the data involved. One set of structures is related to the processing of image data, the other is related to the formation of the code stream.

The image data structure

A diagram showing the image data structure is shown in Figure 2. An image refers to the input of the encoder and the output of the decoder. The associated data structure is described below, the components being listed in decreasing order of their size.

Component: An image is first separated into one or more components. A component can be an arbitrary 2-D rectangular array of samples. The most common image components are the color components representing the three color (R, G, B) planes of an image. The inter-component transform defined in the standard is mainly used with RGB images. However, other decompositions are possible for multi-component images such as radar images (in phase and quadrature) or printer output images (the CMYK color space). Components do not necessarily have the same size, nor the same bit depth.

Tile: Each component is divided into one or several tiles, which are then encoded independently. All tiles are the same size, except tiles at the boundaries. The purpose of tiling is to allow for the coding of very large images by reducing memory consumption and speeding up the coding process. Tiling may cause a slight decrease in compression efficiency, since the ability to exploit spatial redundancy is reduced when small tiles are used. Also, at low bit rates, blocking artifacts may be visible at tile boundaries.

Subband: Each tile in a particular component is encoded using the JPEG 2000 DWT coding algorithm. The tile is input to the wavelet transform, which generates a set of subbands representing different spatial frequency components. Due to subsampling in the wavelet transform, subband sizes decrease by a factor of four at each subband decomposition level.

Precinct: At a particular resolution level except the level 0, a precinct contains a group of code blocks that cover the same rectangular area in all three subbands. At level 0, a precinct is just a group of code blocks in the subband LL 1 . Therefore a precinct represents all the information in a particular image region at a certain resolution level. It is then reasonable to suggest that such information should be packed together in the final code stream. In the code stream, bits are organized by precincts, instead of subbands.

Code block: After the quantization of all subband coefficients, each subband is partitioned into non-overlapping rectangular code blocks. These code blocks are confined within a certain subband, and have the same size except at boundaries of the subband. The maximum size of a code block is 64 × 64. They are input to the EBCOT coding engine. Each code block is encoded independently, and produces a bit stream to be included in the final code stream.

The code stream data structure

The code stream refers to the output of the encoder and the input of the decoder. Its associated data structure, sketched in Figure 3, includes the following components, in increasing order of their size.

Encoded code block: After a code block is encoded using the EBCOT coding engine, a bit stream results. This is the smallest independent unit in the JPEG 2000 code stream.

Packet: Packets are the basic data units in the final code stream. A packet contains all the encoded code blocks from a specific precinct in a tile at a quality layer. The length of a packet can be affected by the size of the precinct and the number of layers. Packets may certainly have different lengths, but they are always aligned at 8-bit boundaries in the final code stream. Each packet has a header indicating its content and related parameters.

Layer: We have introduced the concept of quality layer, which provides finer code stream truncation points with R-D optimization. In order to preserve the embedded feature of the layered coding mechanism, the final code stream should be organized by quality layers. The lower layers should be at the front of the code stream and the higher layers should be at the end. When the code stream is truncated at the decoder, the lower quality layers will be used to reconstruct the image.

Resolution level: The resolution level is closely related to the decomposition level in the discrete wavelet transform in Figure 1 . The four subbands, LL 1 , LH 1 , HL 1 , and HH 1 at each decomposition level l represent a particular resolution. For example, LL 1 , LH 1 , HL 1 , and HH 1 can be used to reconstruct the image at its original resolution. Also, LL 2 , LH 2 , HL 2 , and HH 2 can be used to reconstruct LL 1 , which is the original image at 1/4 resolution. The four resolution levels are defined as follows: LL N belongs to resolution 0, LH N , HL N and HH N belong to resolution level 1, LH N -1, HL N -1 and HH N -1 belong to resolution level l + 1, and finally LH 1 , HL 1 , and HH 1 belong to resolution level N.

Progression order

Because the code stream can be arbitrarily truncated at the decoder, the order of packets in the code stream will affect the decoding performance. The standard defines a set of progression orders, which can be easily implemented through reordering the packets in the code stream. The LRCP progression first encodes all the Precincts in a component, then all Components at the current resolution level, then all the Resolution levels for the current quality layer and finally sequences through all quality Layers. This procedure is implemented using four nested loops, where the precinct loop is the innermost and the quality layer loop is the outermost. Four additional progression orders are defined: RLCP, RPCL, PCRL, and CPRL. Notice that the LRCP order is accuracy progressive, RLCP and RPCL are resolution progressive, while PCRL and CPRL are spatially progressive.

Summary of performance analysis tests

In [ 8 ], we conducted a comprehensive performance analysis on JPEG 2000 baseline coding algorithm. Three JPEG 2000 software implementations (Kakadu, JasPer, JJ2000) were compared with several other standard codecs, including JPEG, JBIG, JPEG-LS, MPEG-4 VTC and H.264 intra coding. Some observations and conclusions of these tests are summarized as follows.

Photographic images: For natural images with consistent global structures (e.g. LENA and WOMAN), JPEG 2000 clearly outperforms all other codecs. This is where the DWT coding structure achieves its best efficiency. For natural images with uncorrelated regional structures or detailed textures (e.g. GOLD HILL, BIKE, and CAFE), JPEG 2000’s advantage over other codecs becomes less significant.

Synthetic and medical images: For computer graphics, compound images and some medical images (e.g. CHART and ULTRASOUND), the block-based H.264-intra codec appears to be more efficient than JPEG 2000 because of its ability to efficiently encode inter-block correlation within a single frame.

Bi-level images: JBIG and JBIG2 perform significantly better than JPEG 2000 on bi-level images such as scanned documents.

Lossless coding: JPEG 2000 in lossless mode and JPEG-LS achieve similar performance on lossless coding, however JPEG-LS is much faster.

Large images: With conventional implementations (e.g. JasPer), tile partition is an effective tool to balance memory usage with coding performance. It can also facilitate efficient parallel processing, especially for images containing several less correlated regions.

Progressive coding: LRCP progression provides fine scale SNR progressive coding, while RLCP progression provides resolution progressive coding.

Error resilience: Both PRED-TERM/RESTART and SEGMARK are effective error resilient mechanisms in JPEG 2000. However, combinations of these or other settings will not achieve noticeable further protection.

Conclusion

An introduction of the JPEG 2000 image coding standard was presented. The important aspects of the JPEG 2000 coding algorithm, structure specifications, as well as frequently used features were discussed. The description of the coding structure and a performance analysis summary provide an objective view of the advantages and disadvantages of this new standard. It is evident that JPEG 2000 will be adopted in many applications that require high image quality over bandwidth constrained channels and media. However, it is unlikely that it will become the only image coding tool to be used in every still image coding applications.

JPEG-LS [next] [back] JPEG

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

almost 6 years ago

hloo

Vote down Vote up

over 6 years ago

we are need java source code to separate sub bands and to get LL band in wavelet decomposition.