Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Multimedia Content Modeling and Personalization - Introduction, Content Modeling, Content Filtering, Content Adaptation, User Modeling, Concluding discussion

information preferences objects media

Marios C Angelides
Brunei University, Uxbridge, UK

Definition: Personalization of information requires content interpretation and modeling, suitable content retrieval and filtering, and filtered content adaptation to the usage environment.


Increasingly, users want to view multimedia content created specifically or at least tailored to cater for their needs and preferences. When people watch videos, they are likely to be interested only in content that matches their request and they often expect the content to be of certain duration. In addition, achieving interoperability across platforms dictates that content either original or modified adheres to certain standards. While some image tracking techniques create or tailor content by working directly with raw multimedia content, matching the user content needs and preferences becomes harder as the complexity of the user requirements increases.

Personalization of information requires content interpretation and modeling, suitable content retrieval and filtering and filtered-content adaptation to the usage environment. Content interpretation and modeling can be done by extracting semantic and structural information about the content and by examining content meta-data. Once a content model has been built, filtering information from it assumes a user profile or at least expression by the users of their needs or preferences. Whilst the simplest form of information filtering requires users to create their own profile manually or with limited assistance, advanced information filtering may build a profile through the application of learning techniques by direct observation of user behavior during interaction. The content may also be adapted to fit current usage conditions and user preferences. Both filtering and adaptation of multimedia content throughout use requires that the user’s needs, preferences, interests, and usage environment are understood and modeled.

Content Modeling

The MPEG-7 Multimedia Description Scheme (MDS) provides the necessary tools for building multimedia content models. At the core of the MDS are the content description and content meta-data tools (see Figure 1).

The former (Structure DS and Semantic DS) can be used to describe the structural and semantic information of multimedia content. The latter (Creation & Production, Media and Usage) can be used to describe any meta-information that is available with the content. MDS also includes description schemes that model the user’s preferences in relation to video content and which can be stored under the user’s control. These schemes support personalized navigation and browsing, by allowing the user to indicate the preferred type of view or browsing, and automatic filtering of preferred content based on the user’s needs and preferences. Literature reports various tools that work with MPEG-7.

The approach one normally follows to build the content model is not as essential in achieving the interoperability across platforms as the compliance of the model with a standard is. As a consequence, approaches vary but most authors agree that the basic ingredients of a content model are (not in order of importance or modeling): objects depicted in the media stream and their visible and known properties, spatial relationships between those objects, video segments that describe events involving one or more objects and the temporal order between segments.

Video segmentation is traditionally the first step toward multimedia content interpretation and modeling. Segmentation is usually driven by often competing factors such as visual similarity, time locality, coherence, color, motion, etc. Some authors adopt cinematic definitions and rules, such as the 180°, montage and continuity rules. Often video segments with similar low-level features or frame-level static features such as key frames are grouped together. Over the years, a plethora of different schemes have been suggested to describe the segments: scenes with shots, events with actions, events with sub-events, sequences with sub-sequences, to name just a few. The segments are then mapped into some kind of structure; from the time before MPEG-7, a hierarchical structure or decomposition with incremental top to bottom semantic granularity was the most dominant. What is not unusual, is to have parts of the original stream appearing in more than one segment. This is not necessarily a fault in boundary detection techniques; it may be the case that a video sequence serves multiple interpretations or perspectives of multimedia content, thereby, adding multiple perspectives to a content model.

Once the video has been segmented according to one or more perspectives, the original video sequence looses its meaning in the new and often non-sequential structure. It is therefore important to describe the new chronological order of the segments. The MDS has developed a new set of temporal relationships (see Table 1) in addition to those suggested by Allen in 1983. Often it is necessary to describe the temporal relationship of each segment to all other segments.

However, the actual content of the media stream is the objects and their absolute and relative locations in a frame, i.e. their spatial relationships to other objects (see Table 1). There are two competing camps with respect to modeling objects: those who use automated image tracking tools to identify objects in each segment and then use automated extraction tools to separate them from the background and those who undertake the painstaking task of manual or semi-automated modeling. With the latter, there is no limit as to what can be modeled: what is visible, what is not, what can be derived, what is known, etc. It is very laborious but it usually ends in very rich and multi-faceted content models. The former tracks and extracts quickly what is visible, what may be obscured, but it does not map or derive from what is known unless it is visible in the footage. Hence, some alternative modeling still needs to be done even with the former. Which of the two is the more efficient is currently subject to debate.

Spatial relationships between objects describe the relative location of objects in relation to other objects (rather than their absolute screen co-ordinates) within the segment and will differ over time. Spatial representations are not an alternative to screen co-ordinates but they complement them. Sometimes when it is difficult to derive screen co-ordinates, a spatial relationship may be the only way to model the object presence.

Content Filtering

The multimodal form of multimedia as well as the heterogeneity of digital media formats makes filtering an intricate problem due to the disparity between the internal structural representations of content elements and the external user requirements for content consumption. Continuous media (Video and audio) requires different filtering techniques from non-continuous media (still images and text) and these can be compartmentalized even further. Filtering techniques analyze content information and prepare presentation of content recommendations using either one or a combination of: rule-based, content-based and collaborative filtering agents.

Rule-based filtering works with rules that have been derived from statistics, such as user demographics and initial user profiles. The rules determine the content that is to be presented to a user. Both the accuracy and the complexity of this type of filtering increase proportionally with the number of rules and the richness of the user profiles. Hence, a major drawback is that it depends on users knowing in advance what content might interest them. Consequently, with this type of filtering, the accuracy and comprehensiveness of both the decision rules and the user modeling are critical success factors.

Content-based filtering chooses content with a high degree of similarity to the content requirements expressed either explicitly or implicitly by the user. Content recommendations relies heavily on previous recommendations, hence, a user profile delimits a region of the content model from which all recommendations will be made. This type of filtering is simple and direct but it lacks serendipity; content that falls outside this region (and the user profile) and might be relevant to a user will never be recommended. Like with rule-based filtering, a major drawback is that the user requirements drive the content filtering process. Hence, this type of filtering is a combined challenge of knowledge engineering and user modeling.

With collaborative filtering every user is assigned to a peer-group whose members’ content ratings in their user profiles correlate to the content ratings in his own user profile and content is then retrieved on the basis of user similarity rather than matching user requirements to content. The peer group’s members act as “recommendation partners” as content retrieved for them but not for a target user can be recommended. With this type of filtering, the quality of filtered content increases proportionally to the size of the user population, and since the matching of content to user requirements does not drive filtering, collaborative recommendations will not restrict a user to a region of the content model. Major drawbacks are the inclusion of new non-rated content in the model as it may take time before it is seen and get rated and the inclusion of users who do not fit into any group because of unusual requirements.

Hybrid filtering techniques are being developed on an ad hoc basis, with the aim being to combine strengths and solve weaknesses. For example, a collaborative content-based hybrid eradicates the problems of new non-rated content with collaborative filtering and content diversity with content-based filtering.

Content Adaptation

Adaptation may require that communication of filtered multimedia content take place via different interconnected networks, servers and clients that assume different Quality of Service (QoS), media modality and content scalability (spatial and temporal). Consequently, this will either require real-time content transcoding, if what is required is changing on the fly a multimedia object’s format into another, or pre-stored multi-modal scalable content with variable QoS (or a hybrid).

This can be achieved through a combination of MPEG-7 and MPEG-21 capabilities. MPEG-7’s Variation Description Scheme (Variation DS) enables standardized scalable variations of multimedia content and meta-data for both summarization and transcoding. While transcoding may transform the spatial and temporal relationships, code, color and properties of an object or even remove completely non-essential objects, it seeks to preserve the content model semantics because it is semantic content sensitive. With intramedia transcoding content semantics are usually preserved as no media transformation takes place. However, with intermedia transcoding content semantics preservation guides the process because media are being transformed from one form to another, e.g. video to text. In this case, while the visual perception of an object may have changed as a result, the semantics of the object should be preserved in the new medium.

MPEG-21’s Digital Item Adaptation (DIA) (see Figure 2) enables standardized description of a digital object, including meta-data, as a structured digital item independently of media nature, type or granularity. Consequently, the object can be transformed into, and communicated as, any medium. MPEG-21 supports standardized communication of digital items across servers and clients with varying QoS. In order to achieve this, the digital item undergoes adaptation through a resource adaptation engine and/or a descriptor adaptation engine.

It is also necessary to describe the usage environment in order to provide the user with the best content experience for the content requested. The usage environment description tools enable description of the terminal capabilities (Codec capabilities, input-output capabilities, device properties) as well as characteristics of the network (network capabilities, network conditions), User (user info, usage preferences, and usage history, presentation preferences, accessibility characteristics, location characteristics), and natural environment (location and time, audiovisual environment).

User Modeling

Personalizing content to the needs of users requires recognizing patterns in users’ behavior. In turn this requires a user model which stores past behavior as this is valuable in predicting a user’s future interactive behavior. However, basing decisions merely on the past does not fully exploit the user potential, it merely stereotypes a user. Furthermore, the model needs to include personal information about the user. MPEG-7’s UserPreferencesDS allows users to specify their preferences for a certain type of content, desired content, ways of browsing, etc. Personal (i.e. demographic) information including prior knowledge and skills, current needs, preferences, interests, goals and plans are acquired directly from the user.

Peer group information is acquired when the user is assigned to a user group. A new member of a peer group may initially be stereotyped often by assuming one of many stereotyped user models. Usage information is acquired by observation during interaction. Filtering shows that is equally important to link content to user needs and preferences as it is to link user needs and preferences to a peer user group in order to acquire a collective user experience.

Concluding discussion

Modeling the semantic content of multimedia enables both the user and the application to engage in intelligent interaction over content and creates the impression that interaction is with the content. Furthermore, semantic content models bestow the ability on the applications that use them to make just-in-time intelligent decisions regarding personalization of multimedia content based on the man-machine interaction and the user expectations. Having the right content filtered through and then adapting it makes interaction with the multimedia content an individual if not an individually rewarding experience. With such applications, multimedia content to be interacted with is not a static commodity but it evolves dynamically.

Unfortunately, neither standard specifies how meta-data are to be used nor filtered according to user requirements. The accuracy and reliability of filtered meta-data to user requirements relies largely on two factors: the richness and depth of detail used in the creation of the meta-data model and the level of content-dependency of the filtering techniques employed. Furthermore, the interoperability of user-driven information retrieval, especially across platforms, is greatly enhanced if the underlying process is standardized and consumption of multimedia content is adapted to suit each user individually. Consequently, whilst it has become necessary to work with standards when modeling, filtering and adapting multimedia content, the entire process of doing so still remains relatively open to individual interpretation and exploitation.

Multimedia Content Modeling in Cosmos-7 [next] [back] Multimedia Content Adaption

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

almost 3 years ago












































































































Vote down Vote up

over 5 years ago