Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Multimedia Content Adaption

adaptation conversion device devices

David C. Gibbon
AT&T Labs Research, Middletown, NJ, USA

Definition : Multimedia content adaptation allows the ever increasing variety of IP enabled devices such as personal digital assistants (PDA), Smartphones, and set-top boxes, to access existing rich media resources available on the Internet.

Broadly, it is the process of transforming a logical set of video, images, text, and other media from a source representation to one or more target representations. These new forms of content not only accommodate restricted device input and output capabilities, but they also support content delivery over bandwidth constrained connections such as wireless links. Content adaptation techniques can be applied in multimedia indexing and browsing systems and can be used to help maintain quality of service (QoS) for multimedia network services. Note that aspects of content adaptation are often referred to as content repurposing , and sometimes as content reuse or re-authoring .

Purpose : Multimedia content authoring is a time-consuming and expensive process, often involving a production staff. Content owners maintain collections of their assets, eventually building large multimedia archives. Usually the content is designed to be consumed on a specific class of devices, such as desktop PCs. Recently however, the availability of new wireless network services has spurred dramatic growth in numbers of mobile device users. Also, broadband IP services are coming to the set-top through technologies such as ADSL2+ and IPTV. Although these emerging devices may attempt to support access to legacy content, for example by including an HTTP client, often the resulting user experience is unworkable due to device limitations.

Multimedia content adaptation maximizes the return on the authoring investment by increasing the diversity of the content consumption options thereby allowing the content to reach a wider audience. It frees authors from the tasks of creating a specific version for each possible user device. In fact, for legacy content, the devices may not have existed or been widely available at the time of that the content was produced and without content adaptation, existing archives of multimedia content would not be accessible from these emerging devices.

Additional Applications : Since multimedia content adaptation reduces the required bandwidth or bit rate to deliver the content, it can be used for maintaining quality of service by reducing the resource utilization of servers and networks in a more elegant manner than strict admission control Additionally, it increases flexibility for end users, reducing costs in cases where the data connection is billed per kilobyte as in some GSM services. While normally intended for rendering content for end users, content adaptation is also useful as a systems preprocessing operation, e.g. feeding video input to a still image retrieval system. The distilled representations that are formed as a result of content adaptation are also relevant to content summarization or indexing. The combination of these benefits allows content adaptation techniques to be used in systems designed for searching and quickly browsing content.

In addition to device/bandwidth adaptation, other types of content adaptation are related to internationalization of content which may involve language translation and localization issues such as character encodings. Persons with hearing impairments or other disabilities can also benefit from content adaptation techniques because adaptation can involve media conversion to a form that is more easily interpreted.

Techniques : When designing and creating multimedia content, authors assume a particular target user context which is a specification of device capabilities, connection bandwidth, and usage environment (e.g. whether the user is in a private office or on public transportation.) In some cases, authors may plan for content adaptation by adding semantic tags, specifying allowable substitution sets, prioritizing data, or using scalable codecs. They may even design multiple versions of the presentation for a number of target devices. However, given the sheer number of possible user contexts, it is not practical to build customized versions for each device. Therefore, automatic or semi-automatic methods must be employed, and they should leverage as much knowledge as possible that has been captured from the content author through metadata annotation.

Figure 1 is a conceptual diagram showing how content adaptation occupies a central role in the process of content creation and delivery to end users. The main components of multimedia content adaptation include content analysis to extract semantics, content conversion (transcoding, modality conversion, etc.), decision-making based on user preferences or device capabilities, and view rendering or marshalling multimedia components for delivery.

As authors generate source multimedia content they may assemble and edit existing recorded media to enhance the presentation. They may produce alternative representations designed for multiple end user devices and differing bandwidth connections. Authors create some level of content description as well; this may range from a minimal set of bibliographic attributes such as title, author name, and creation date, to a very detailed description in a markup language such as HTML or XML. For example, with HTML, document outline level can be expressed using header tags and image descriptors may be included via the “alt” attribute. The authors may use HTML styles encoded in CSS format or, for XML use XSL transformations to help separate content from presentation. For video and audio content, MPEG-7 specifies an XML schema for representing content descriptors. Multimedia Metamodels have also been proposed to formally model the content adaptation process taking human factors into account and using the Unified Modeling Language (UML) for model representation [ 6 ]. Since creating detailed content descriptors is tedious, automated media analysis techniques are often used to provide additional metadata to be used for content adaptation. Analysis may include segmentation, such as identifying topic boundaries in long-form content to create smaller, more manageable segments. Pagination techniques compensate for the lack of device spatial resolution by segmenting HTML content into multiple pages to be consumed sequentially, or as a hierarchal outline-like structure Finally, note that analysis and conversion often are implemented as a joint process rather than as separate modules as shown in the figure.

Device Capabilities : As indicated in Figure 1, when users consume content, they first specify their device capabilities and presentation preferences either explicitly through a profile or active selection of content, or implicitly by virtue of the device that they choose to use to access the content. The device itself may inform the service of its capabilities, or it may simply identify itself and rely upon the server to be aware of its capabilities. There are several standards efforts that address these issues including: HTTP content negation for HTML content, MPEG-7 user preference descriptors, MPEG-21 usage environment descriptors, and SIP feature negotiation.

Table 1 provides a general indication of the range of input and output capabilities (shown as the number of display picture elements) for a typical computer, television, digital assistant, and mobile phone. The last row indicates a rough measure of input capability as the number of keys in the device’s keyboard or remote control. Of course this is just one metric, and each device has its own UI strengths, e.g. the TV may display video at higher quality than the PC, PDAs typically include stylus input with touch screens, etc.

There are many other metrics for quantifying device capabilities including display bandwidth and number of degrees of freedom for input devices. In addition to I/O capabilities, device profiles may also indicate computational capabilities, locale information, or maximum network interface bandwidth. For mobile devices, power consumption is a fundamental concern and users may indicate preferences that will specify scaled down content to conserver power during content rendering and display.

Conversion: Adaptation of time-based media such as audio and video poses different challenges than adapting documents using styles or pagination techniques mentioned above. Many types of devices cannot render video due to the computing and system resource requirements associated with decoding and displaying 15 or 30 frames per second. Other devices may not have audio playback capability. To address these issues, media modality conversion attempts to deliver the best possible representation of multimedia content to resource constrained devices The idea is to convey the content author’s message, be it informative or for entertainment, with as much fidelity as possible. Obviously there are limits to the applicability of modality conversion, due not only to the device in question, but also as a function of the type of multimedia content. Since modality conversion involves extracting a concise representation of the author’s message, highly structured or annotated content will be more readily processed than unstructured content. For example, modality conversion has been successfully applied for broadcast television news using key-frame extraction and exploiting the closed caption text. Another motivation for modality conversion is that it provides for bandwidth reduction beyond what is possible with traditional transcoding.

Figure 2 gives an indication of the order of magnitude of the bandwidth necessary to transmit individual streaming media components.

As can be seen, transcoding reduces the bit rate within a media component, while modality conversion achieves this goal by switching among components. Note that while the primary goal of transcoding is bandwidth reduction, it also reduces the client (decoder) computational load if the frame rate or frame size is reduced, thus enabling delivery to a wider range of devices. Similarly, modality conversion not only expands the range of target device types, it also reduces communication bandwidth. The figure suggests a hierarchical priority scheme, where smaller, more critical information is retained, while larger components may be degraded or omitted if necessary. Ideally, authors will provide versions of content in several modalities and indicate allowable substitutions, otherwise automated analysis and conversion methods must be relied upon to generate surrogates.

Transcoding: Media components are compressed using general purpose signal processing techniques to reduce bandwidth requirements while minimizing distortion. Transcoding involves conversion from one encoding to another, reducing the bitrate a second time ( transrating ) or performing format conversion, which may be necessary if the target device is not capable of decoding the content in its native format (e.g. an MP3 player that cannot decode AAC format audio.)

If scalable encoding is used for the media components, then transcoding is not necessary; lower bit rate representations can be efficiently derived or extracted from the source material. JPEG 2000’s use of wavelet encoding for still frames intrinsically encodes multiple spatial resolutions. MPEG-4 supports spatial, temporal, and even object scalability, where a chroma-keyed layer of video may be independently transmitted.

Content-based Transcoding: Transcoding is largely content-independent and is applicable to a much broader range of content types than modality conversion methods. The limitations of encoding algorithms are related to the basic nature of the content, e.g. natural scenes vs. graphics, or audio vs. speech. However, in some cases specialized content-based processing can be used to reduce bit rates within the same modality. Motion analysis or face detection can be used to extract regions of interest from videos and still frames to dynamically crop source material rather than simply scaling down to a smaller frame size Along the same lines, video summarization or text summarization can be used to reduce the overall volume of data within a particular modality.

Modality Conversion: Common examples of modality conversion include video to image conversion using shot boundary detection and audio to text conversion. Content based video sampling goes beyond shot boundary detection in that intra-shot representative samples are collected via monitoring camera operations such as panning and zooming to more accurately convey the visual contents of a scene. The resulting set of representative images can be played back in a slide-show mode with accompanying audio or displayed as a series of web pages. This latter mode is particularly effective when combined with text from closed captioning or other transcription.

Most modality conversion algorithms assume a particular content domain, e.g. shot boundary detection is only useful for edited or post-production video. Obviously these techniques are not suitable for stationary camera input such as from web cams, and instead, motion based event detection methods are more applicable. Also, speech to text modality conversion using speech recognition cannot be applied directly to arbitrary audio streams that may contain music or multiple languages.

Streaming content and proxy servers: Although Figure 1 suggests an on-demand usage paradigm, content adaptation can also be used on live input streams. In the on-demand case, the content analysis, conversion, and even perhaps rendering can take place before the content is published. Then as users request the content, the decision module can simply select the appropriate rendering, or perhaps marshal some components from sets of renderings and deliver this to the user. However, for real-time communications such as live television or distance learning, these functions must be performed dynamically as the data flows from the producer to the consumer. In these cases, a proxy server architecture can be employed Dynamic adaptation can also be useful for reducing the storage requirements for large multimedia collections, albeit at the cost of increased computational load. In addition to taking place at production time or distribution time via proxies, content adaptation can take place on the client device in cases where bandwidth and device computational capabilities are not a limiting factor, e.g. as with a set-top box. Clearly the design of content distribution networks (CDN) will be influenced by these alternative architectural design decisions which determine where adaptation takes place.

Content Personalization: Having the ability to convert a particular rich media presentation in to a form that is consumable on a user’s device may not be sufficient to build truly useable systems and services. Devices such as mobile phones or IP connected televisions have severely restricted user input modalities which make interacting with multimedia content, or even selecting content for viewing extremely difficult. For these situations, content personalization can be employed to assist in content selection, or to go further and automatically assemble presentations based on user’s interests. Users can express their interests beforehand (perhaps using a conventional PC with keyboard) in the form of keywords, categories, news feeds, channels, etc. and this information can be stored by the service provider. Then, when accessing content from input-constrained devices, the system can apply the profile to select appropriate content, and extract relevant segments if necessary.

Depending on the task, multimedia content adaptation may involve using a combination of techniques including content personalization, modality conversion, transcoding, and incorporating style sheets. With these tools, the goal of delivering multimedia content to a wide range of user devices can be largely achieved.

Multimedia Content Modeling and Personalization - Introduction, Content Modeling, Content Filtering, Content Adaptation, User Modeling, Concluding discussion [next] [back] Multimedia Content Adaptation in MPEG-21

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or