Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Multimedia Streaming on the Internet - Compression, QoS Control/Monitor, Streaming Protocols, Media Synchronization

rate data network based

KM. Ho and K.T. Lo
The Hong Kong Polytechnic University, Hong Kong, China

J. Feng
City University of Hong Kong, Hong Kong, China

Definition: Streaming is an enabling technology for providing multimedia data delivery between clients in various multimedia applications on the Internet.

The Internet has seen miraculous growth since its appearance. Web browsing and file transfer are the dominant services provided through the Internet. However, these kinds of service providing information about text, pictures and document exchange are no longer satisfied the demand of clients. With the recent advances in digital technologies, such as high-speed networking, media compression technologies and fast computer processing power, more and more multimedia applications involving digital audio and video are come into practice on the Internet.

Streaming is an enabling technology for providing multimedia data delivery between (or among) clients in various multimedia applications on the Internet. With this technology, the client can playback the media content without waiting for the entire media file to arrive. Compared with conventional data communication, delivery of multimedia data has more stringent requirements on network bandwidth, delay and loss. However, the current Internet is inherently a packet-switched network that was not designed to handle isochronous (continuous time-based) traffic such as audio and video. The Internet only provides best-effort services and has no guarantee on the quality of service (QoS) for multimedia data transmission. As a result, there are still many open problems in designing protocols and transmission strategies for multimedia streaming.

Figure 1 depicts the architecture for a typical multimedia streaming system on the Internet. Video and audio compression algorithms are first applied to compress the raw audiovisual data to achieve efficiency on storage and transmission. Streaming protocols provide means to the client and the server for services negotiation, data transmission and network addressing. When a request for service is received, the server will decide whether this request will be accepted or not based on the information from the service manager. With the acceptance of the request, resources will be allocated. Media contents retrieved from the storage device are packetized with media information such as timestamp and then delivered to the client. If the server cannot fulfill the request, the client may be blocked or en-queue in the system. The arriving packet at the client is decapsulated into media information and media content. QoS Monitor utilizes these media information to analyze the network condition and feeds back to QoS control in the server for adapting the QoS requirements. On the other hand, the media content is decoded and passed to the application for playback. Audio and video may be transmitted by separated streams. To achieve synchronization among various streams, media synchronization mechanisms are required.


As the large volume of raw multimedia data imposes a stringent bandwidth requirement on the network, compression is widely employed to achieve transmission efficiency. Since video has larger bandwidth requirement (56 Kbps-15 Mbps) than audio (8 Kbps-128 Kbps) and loss of audio is more annoying to human than video, audio is given higher priority for transmission in a multimedia streaming system. Hence, only video will be used for adaptation in order to meet the QoS requirements. Therefore, we will focus on the features of video compression that are useful for adaptation in the following.

Video compression schemes can be classified into two types: scalable and non-scalable. With scalable coding, streams of different rates can be extracted from a single stream when required. Hence, a single stream can suit to requirements of different clients in a heterogeneous network environment. As the encoder may not know the network condition, the traditional scalable coding approach providing only a step-like quality enhancement may not be able to fully utilize the available bit-rate of the channel. On the other hand, the decoder may not be able to decode all the received data fast enough for reconstruction. Therefore, the objective of video coding for multimedia streaming is to optimizing the video quality over a given bit-rate range instead of a given bit-rate. Also the bitstream should be partially decodable at any bit-rate within the bit-rate range to reconstruct with optimized quality. To meet these demands, a new scalable coding mechanism, called fine granularity scalability (FGS)was proposed in MPEG-4.

The block diagram of FGS video encoder is illustrated in Figure 2. An FGS encoder compresses raw video data into two streams, base layer bitstream and enhancement bitstream. Similar to the traditional video encoder, it relies on two basic methods for compression: intra-frame DCT coding for reduction of spatial redundancy and inter-frame motion compensation for reduction of temporal redundancy.

Different from traditional scalable encoder, the FGS encoder produces the enhancement stream using bitplane coding, which is achieved by coding the difference between the DCT coefficients on the reconstructed frame and the original frame and then extracting each bit from 64 DCT coefficients with same significant to form a bitplane (BP). Therefore, all the most significant bits (MSB) from the 64 DCT coefficients form BP-1 and all the second MSB form BP-2, and so on (see Figure 4). With this coding technique, the encoder can truncate the bitstream of the enhancement layer anywhere to achieve continuous rate control. Figure 3 shows the block diagram of FGS video decoder, which operates in the reversed manner to the encoder. Unlike conventional decoder, the FGS decoder can partially decode the received bitstream based on the current available resources (e.g. computational resource) in order to construct the frame before its predicted playback time.

QoS Control/Monitor

The dynamic nature of the Internet introduces unpredicted delay and packet loss to the media streams during transmission, which may affect the presentation quality. QoS Control/Monitor mechanism aims to avoid congestion and maximize the presentation quality in the presence of packet loss. The techniques, congestion control and error control, are deployed to the end-system without the aid from network to provide certain level of QoS support to the system. Figure 5 summarizes different congestion control and error control techniques.

Each router has a finite storage capacity and all streams flows attached to the router compete for occupying these capacity. If the router has enough resources to serve all its attached flows, its operation runs normally. However, when the data flows reach the capacity of its attached router, the router starts to drop packets. Excessive queuing time and packet drop in the router result in excess delay and bursty loss that have a devastating effect on the presentation quality of media contents. So, congestion control aims at minimizing the possibility of network congestion by matching the rate of the multimedia streams to the available network bandwidth.

Two approaches are widely used for congestion control: rate control and rate shaping. The former is used to determine the transmission rate of media streams based on the estimated network bandwidth while the latter aims at matching the rate of a precompressed media bitstreams to the target rate constraint by using filtering. Based on the place where rate control is taken in the system, rate control can be categorized into three types: source-based, receiver-based and hybrid-based.

With the source-based rate control, only the sender (server) is responsible for adapting the transmission rate. In contrast, the receiving rate of the streams is regulated by the client in the receiver-based method. Hybrid-based rate control employs the aforementioned schemes at the same time, i.e. both the server and client are needed to participant in the rate control. Typically, the source-based scheme is used in either unicast or multicast environment while the receiver-based method is deployed in multicast only. Either of these rate-control mechanisms uses the approaches of window-based or model based for rate adaptation. The window-based approach uses probing experiments to examine the availability of network bandwidth. In case of no packet loss, it increases its sending rate; otherwise, it reduces its sending rate. The model-based approach is based on a throughput model of Transmission Control Protocol (TCP) to determine the sending rate (A), which is characterized by:

where MTU is maximum transmit unit, RTT is round-trip time for the connection and p is packet loss ratio. This approach prevents congestion in a similar way to that of TCP. On the other hand, because the stored media contents are pre-compressed at a certain rate, the current network condition may not fulfill this rate requirement. By utilizing frame drop filter, layer drop filter and transcoding filter 17, rate shaping performs a filter-like mechanism for rate adaptation of media contents. Frame drop filter is used to reduce the data rate of the media content by discarding a number of frames. Layer drop filter drops Page 600  (video) layers according to the network condition. And transcoding filter performs transcoding between different compression schemes to achieve the target sending rate.

Packet misroute, packet drop from the router and packet obsolete due to the miss of its predicted playback time are the reasons of presentation quality degradation. To enhance the quality in presence of packet loss, error control should be deployed. Recovering the packet loss can be achieved by the traditional methods of forward error coding (FEC) and retransmission. The principle of FEC is to add extra information to a compressed bitstream. Media contents are first packetized into a number of packets which then form a group for every k packets. Each group is applied to FEC encoder to generate n packets (n>k). To reconstruct the original group of packets, the receiving side only needs to have any k packets in the n packets (see Figure 6).

Retransmission is simply to ask for resend the loss packet in case loss has detected. On the other hand, error control mechanisms such as error resilience and error concealment are developed to minimize the level of visual quality degradation when loss is present. Error resilience, being executed by the sender, attempts to prevent error propagation or limit the scope of damage on the compression layer. Re-synchronization marking, data partitioning and data recovery are included in the standardized error-resilient encoding scheme. Error concealment is performed by the receiver when packet loss has already occurred. It tries to conceal the lost data and makes the presentation less annoying to the human. With error concealment, missing information in the receiver is reconstructed using neighboring spatial in the current frame or temporal information from the data in the previous frames.

Streaming Protocols

Streaming protocols provide means to the client and the server for services negotiation, data transmission and network addressing. Protocols relevance to multimedia streaming can be classified into three categories: network-layer protocol, transport protocol and session control protocol. Network-layer protocol, being served by IP, provides basic network service such as address resolution and network addressing. Transport protocols, such as TCP, UDP and RTP/RTCP, provide end-to-end transport functions for data transmission. Defining the messages and procedures to control the delivery of Page 601  multimedia data is done by session control protocol, e.g. RTSP or SIP. The whole picture for these protocols in the system is depicted in Figure 7.

Before the multimedia data can be delivered properly, a session should be established between end-points to negotiate the services based on their capabilities and requirements. Depending on the service requirements, different session protocols can be employed. The Real-Time Streaming Protocol (RTSP) is used for controlling the delivery of data with real-time properties in a streaming system. RTSP also provides VCR-like function to control either a single or several time-synchronized streams of continuous media between the server and the client. While RTSP is suitable for media retrieval system, another protocol, Session Initiation Protocol (SIP), is mainly designed for interactive multimedia application, such as Internet phone and video conferencing. Once the session has been established and the required services have negotiated successfully, compressed multimedia data is retrieved and packetized in RTP module which defines a way to format the IP packets carrying multimedia data and provides information on the type of data transported, timestamp for multiple streams synchronization, and sequence numbers for packet sequence reordering and loss detection. RTP itself does not guarantee QoS or reliable delivery, so RTCP is designed to work in conjunction with RTP to provide QoS feedback information such as packet loss ratio and inter-arrival jitter to the system. The system (QoS control/monitor) utilizes this information to evaluate the network condition and react with suitable actions, says, rate adaptation. The packetized packets are then passed to the UDP/IP layer for transmission over the Internet. The media streams are then processed in the reversed manner before playback in the client.

Media Synchronization

Due to different route and incurred unpredictable delay during transmission, media streams may lose synchronization. Therefore, media synchronization mechanism is needed to maintain the original temporal relationships within one media stream or among various media streams such that the media contents can be presented properly. There are three levels of synchronization, namely, intra-stream, inter-stream and inter-object synchronization. Intra-stream synchronization is deployed to maintain the continuity of the stream itself that each received video/audio frame should be played back within its predicted playback time; otherwise, the presentation will be interrupted by pauses or gaps. Inter-stream synchronization aims at maintaining the temporal relationship among various media streams, such as audio frame should be played back with its corresponding video frame in the same way as they were originally captured. Inter-object synchronization is used to synchronize the media streams with time-independent data such as text and still images. For example, each slide should be appeared within the corresponding commenting audio stream in the slide show (see Figure 8).

Multimedia Synchronization - Area Overview - Historical Perspective, The Axes of Synchronization, Temporal Synchrony: Basic Constructs, Temporal Synchrony: Basic Mechanics [next] [back] Multimedia Storage Organizations - Introduction, Data Placement,  

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or