Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Optimization of Videocontent Descriptions for Retrieval - Abstract, Introduction, Normalized audiovisual content description, Our approach, Vector Space Model (VSM)

xml search descriptors method

Definition: XML descriptions need to be optimized to speed up the matching and retrieval processes for video content.


The search of audiovisual information becomes today a peremptory necessity for a number of applications (for example: movie production) that require frequent accesses to audiovisual content. The search of information requires audiovisual content descriptions, and preferably content descriptions based on a standard, such as XML. The standard is necessary to share content descriptions by different search tools. XML is suitable to represent the complexity of descriptions; however, it is not suitable to support efficient matching and retrieval. Therefore, XML descriptions need to be optimized to speed up the matching and retrieval processes. And this is the scope of this article: developing a method that deals with this shortcoming of XML descriptions.

The method proceeds in two steps: Firstly, it transforms audiovisual content description from XML space into a multidimensional space, composed of numeric vectors. Secondly, the vectors are organized in a new multidimensional indexing structure, called Kpyr. Kpyr combines a pyramid technique and a clustering method to avoid the disadvantages of the current approaches that are not efficient when data are not uniformly distributed in the multidimensional space. Experiment results confirm our expectations: good quality of answers in a relatively short time.


In recent years, the size of digital video archives has grown enormously in many application domains, such as news agencies, TV broadcasters, and advertising agencies that run large digital archives of movies and video footage. The data volume growing and the use of those increasingly turned towards audiovisual professionals made video information retrieval a major problem. Many research efforts and progresses have been deployed by researchers these last years. Although these efforts and progresses, tools available remain insufficient or unsuited, and the need to study and analyze for better managing this mass of information is strongly made feel.

When retrieval process and standard representation are considered, an important mass of content descriptions are generated, using XML standard format. We use also the term of normalized (standardized) content descriptions. Normalized content descriptions are powerfulto represent the complexity of descriptors. However, they are not designed for efficient matching and retrieval, when dealing with high number of descriptors and thousands of video hours. And this is the major shortcoming of normalized content descriptions.

This article deals with this major shortcoming of the normalized audiovisual content descriptions, based on XML format. Historically, the use of this standard in content description is partly the result of multimedia content description interface (MPEG-7) recommendations, actually not really used in audiovisual industry; however, it had the merit to promote XML for multimedia content description.

To deal with this shortcoming, our method transforms the normalized audiovisual content description into vectors of a multidimensional space model. And then, the vectors are organized efficiently using our multidimensional indexing method, Kpyr. In such representation, user queries and video segments are both represented by vectors in the multidimensional space model. The length of dimension is equal to the number of descriptors by video segments.

The paper is organized as follows: Section 2 describes the normalized audiovisual content description. Section 3 describes the vector space model and our indexing method. Section 4 details the implementation of the search engine that includes our method. Section 5 concludes the paper and addresses future works.

Normalized audiovisual content description

We use the term of “normalized” or “standard” to highlight XML format. So, normalized audiovisual content description means audiovisual content description based on XML format. We use also the term of standardized or normalized meta-data. In our point of view, MPEG-7 standard boosted the use of XML for audiovisual content description.

The use of XML to describe audiovisual content is the result of two important properties of XML. Firstly, XML format is able to describe the complexity of audiovisual descriptors. Secondly, XML is a standard, so it is easy to share the standardized representation by different search tools. And more generally, the interest of the standard is to support interoperability of search tools. In other way, we can see audiovisual content description based on MPEG-7, as a particular case, of normalized audiovisual content description.

Annotating audiovisual content for retrieval requires the generation of masses of descriptions, which must be filtered, indexed and stored efficiently to ensure fast information accesses. Our research objective aims at managing masses of normalized audiovisual content descriptions. More particularly, our research objective is to develop tools around the normalized content description, in order to allow efficient audiovisual content search (Figure 1).

Normalized audiovisual content description is composed of audiovisual descriptors and description schemes. The descriptors define the syntax and the semantics of audiovisual features. The description schemes specify, basically, the structure and semantics of relationships between descriptors. In general, it is composed of descriptors and sub-description schemes.

In the context of our application, we consider about sixty audiovisual descriptors, and description schemes. The majority of descriptors are included in the text annotation description scheme. These audiovisual descriptors are associated to thousands of video segments of audiovisual database. These descriptors and description schemes constitute the normalized audiovisual content description. And, they are generated automatically from professional annotation tools.

The efficiency of audiovisual searches depends mainly of the access method to the normalized audiovisual content description. These accesses include frequent matching and retrieval. We propose a method that optimizes normalized audiovisual content description for efficient matching and retrieval. The optimization consists of transforming normalized descriptions into numeric vectors, and creating of an index structure suitable for frequent matching and retrieval. In this paper, we will focus on these points: creating a suitable index structure in vector space model.

The numeric vectors consider both numeric and semantic descriptors. The numeric descriptors have numeric values, which admit orders (ex. Media time point, Media increment duration). And semantic descriptors represent semantic information such as (Free text annotation: mountain, rivers, and forest). In this step of work, we didn’t consider proper noun (ex. Michel, Dupond,). Each audiovisual segment is a numeric vector (point) in the space model.

Our approach

Our approach proceeds as shown in Figure 2. First, audiovisual content descriptions based on XML are transformed (coded) into numeric vectors of a multidimensional space model. Second, vectors are organized in efficient data structure. In addition, the query processing needs two different algorithms: the first one uses a Boolean search on the whole audiovisual bases in order to consider the proper nouns; the second one consists of transforming the query into a point in the vector space and doing search in the indexing structure. First, we describe the vector space model we use. Then, we present our indexing structure.

Vector Space Model (VSM)

The vector Space Model transforms XML description of audiovisual segments into points P in the space of N dimensions: P (x0, x1, xN). The size of dimensions (N) is equal to the number of audiovisual descriptors, excluding proper nouns. So, each audiovisual descriptor corresponds to a dimension in the space. N dimensions are resulting from numeric semantic descriptors N’ < N. N” remaining dimensions corresponds to proper noun descriptors, and determined as following:

  • We get all the key words present in MPEG-7 document describing the contents of video sequences.
  • The set of all proper nouns is classified by categories, for example Actor, Producer Those are stored in hash tables and will be used for a Boolean research only on the proper nouns.
  • Using a thesaurus, all keywords are classified into several categories e.g. River, mountain, animals belong to Landscape Category.
  • Each category corresponds to a dimension. We obtain thus the N” dimensions of the space.

The transformation of a video sequence into a point P (x0, x1,…,xN) in this space is done following these rules:

The set of coordinates is normalized xi € [0, 1], 0<i<N. If dimension corresponds to a numerical descriptor, then xi is equal to the normalized value of this descriptor:

If the dimension corresponds to a category resulting from semantic descriptors, value xi is equal to:

Figure 3 gives an example of a vector space model related to normalized audiovisual content description. For semantic descriptors, using a thesaurus, all key words belonging to the FreeTextAnnotantion (FTA) Descriptor are classified into several categories e.g. River, mountain, animals belong to Landscape Category. The method extracts the frequency of key word by category. So, it counts the number of word keys by category. For numerical descriptors, we keep the value. The obtained vector will be then normalized.

Once the transformation of data into Vector Space Model is established, then the treatment will relate only to the multidimensional points corresponding to the video sequences. Below we describe the method we develop to index the set of these points.

Kpyr: our indexing structure

In a first time we present briefly some related works of multidimensional indexing method then we give an overview of Kpyr method.

Related Work

Many indexing structures have been proposed in the literature. Among them, we have VA File, Pyramid technique, IQ-Tree, LPC File , and Idistance. The known Pyramid Technique, appeared in 1998, deals with the « curse of dimensionality ». Its performances are not affected in high dimensional space. However, its effectiveness depends on the data distribution and the volume of the database. It is also sensitive to the position of the queries in the space. In some cases, a sequential scan over all the data points may be more effective. The Pyramid Technique has been used by other indexing methods appeared more recently, such as P+Tree. However those methods don’t offer efficient structures that support both type of queries: KNN and WQ, furthermore they are strongly dependent of the workload (data size, dimensionality and data distribution).

Overview of Kpyr

Kpyr Algorithm follows four main steps:

  • For a better clustering technique of data, we use K-means algorithm. We obtain as results K homogenous clusters.
  • For each cluster, we transform the corresponding subspace into a hypercube unit so that the Pyramid technique can be applied (transformation base function).
  • We apply pyramid technique and obtain one B+Tree by cluster.
  • We determine borders delimiting the space regions represented by clusters. “A binary space tree” will be built from these borders. As shown in Figure 8, leaf nodes store pointers to B+Trees corresponding to different clusters. Others nodes represent the borders.

The search is done in two steps. The first one consists in determining the regions concerned with research using the borders. We deduce from them the clusters reached by the request. The second one corresponds to WQ inside the pyramid, associated to clusters.

Window query follows two steps. In the first one, the regions of points (vectors) concerned by the search are extracted, and then clusters, that are target of search, are deduced. The region extraction is based on borders in the binary space tree. In the second one, we search data points in all B+Trees of concerned clusters.

The contribution of our method is to provide the best conditions to apply the Pyramid Technique: The resulting indexing is more effective. Indeed, the space and the number of data to be indexed by Pyramid technique are reduced. The clusters are globally homogeneous; their center is the top of each pyramid, a reference point for the points of the cluster.

The calculation of the borders is achieved on data already clustered, which allows a better division of space than P+Tree using a approximated version of Bisecting K-means. Borders are solely used to determine the clusters concerned by the query, which efficiently reduces the search space and therefore the response time. Algorithms and experiments of our indexing method Kpyr have been presented in.

Search Engine

One goal of our project is to build a tool for video sequences retrieval. This search engine has been realized in order to be used on line. The search made by end users is possible through three steps: the first step concerns the search interface where the end user makes his query. The end user can choose a category within a list of categories concerning the application field. Categories are corresponding to company films field. After choosing a category, our interface gives the end user a list of frequent keywords links to this category. For example, « professional sector » category, we have « office », « industrial workshop », « Commerce »… as frequents keywords linked. The end user can choose categories and keywords he likes, before submitting his search. In the figure example, we got three categories and keywords.

Second step of the search send the end user query to the server that contains our database and our indexing structure created off line.

The last step concerns the results of the search. Indeed, as shown in Figure 4, all video sequences resulting from this search appeared in our result interface with some information related to each sequence. We print in screen the pertinence value, the video title, the number of videotape linked to the same project, the beginning of the sequence in the videotape, its duration and a key image of the video sequence. A detail button has been added if the end user needs to see the normalized audiovisual content description.


More and more experimental search engines are based on normalized audiovisual content descriptions (normalized meta-data). The notion of normalized meta-data is strongly linked to XML standard. The use of the standard for audiovisual content description has become a reality on the basis of multimedia content description interface recommendations, known as MPEG-7. Using XML to describe audiovisual content has several advantages. Firstly, XML is a standard. So, different search tools may access to the same audiovisual content descriptions. Secondly, XML is powerful enough to represent complexity of audiovisual descriptors.

The XML standard, even if suitable to deal with complexities of audiovisual descriptors, is just a model of representation. It was not designed for efficient matching and retrieval processes. This is the role of non-normative methods around the standard necessary to exploit it. This point was particularly underlined during MPEG-7 recommendations. So, there are needs of methods or system mechanisms necessary to make efficient the use of the standard.

This article describes a method necessary to make efficient the use of the audiovisual content description based on XML in matching and retrieval processes. The method proceeds initially by a transformation of XML audiovisual content descriptions into a vector space model. Then, a new indexing technique is designed, implemented and tested, called Kpyr. Kpyr clusters vectors into categories and represents category vectors into pyramid structures. The main idea of the technique is the association of pyramid structures with categories (clusters) of vectors. The best state of art techniques, including pyramid and P+-tree, offer good performances in high dimensional spaces, and propose a powerful indexing method when the data are distributed homogeneously in the representation space. However, when data are not distributed homogeneously in the multidimensional space, these techniques loose their performances. That is why, firstly, our method clusters vectors, and then applies a pyramid technique in clusters. Globally, clusters contain homogeneous vectors. The cluster is considered homogeneous when the distortion of vectors is minimum. It means that the mean distance between the vectors and their center of gravity is minimum, on the basis of partition clustering.

Orb Photography [next] [back] Oppenheimer, (Julius) Robert - SCIENCE AND THE SECOND WORLD WAR (1939–45)

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or