Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from U-Z

Video Usage Mining - Introduction, Related work, Proposed approach, Future work and research

user viewing based clustering

Sylvain Mongy, Fatma Bouali, and Chabane Djeraba
University of Lille, Lille, France

Definition: Video usage mining refers to analysis of user behaviors in large video databases.

Analysis of user behaviors in large video databases is an emergent problem. The growing importance of video in every day life (ex. Movie production) increases automatically the importance of video usage. To be able to cope with the abundance of available videos, users of these videos need intelligent software systems that fully utilize the rich source information hidden in user behaviors on large video data bases for retrieving and navigating through videos. In this paper, we present a framework for video usage mining to generate user profiles on a video search engine in the context of movie production. We propose a two level model based approach for modeling user behaviors on a video search engine. The first level aims at modeling and clustering user behavior on a single video sequence (intra video behavior), the second one aims at modeling and clustering user behavior on a set of video sequences (inter video behavior). Based on this representation we have developed a two phase clustering algorithm that fits these data. First results obtained from test dataset show that taking into account intra-video behavior to cluster inter-video behavior produces more meaningful results.


With the fast development in video capture, storage and distribution technologies, digital videos are more accessible than ever. The amount of these archives is increasing at a rapid rate. To deal with it, video usage mining, which aims at analyzing user behaviors on a set of video data, is one of the key technologies to create suitable tools to help people browsing and searching the large amount of video data. Indeed, like in web mining field the extracted information will enable to improve video accesses.

In this paper, we present a framework that combines intra-video usage mining and inter-video usage mining to generate user profiles on a video search engine in the context of movie production. Specifically, we have borrowed the idea of navigation history from web browsers used in web usage mining, and propose a novel approach that defines two types of log from the log data gathered on a video search engine. The first one concerns the way a user views a video sequence (play, pause, forward.) and can be called intra-video usage mining. At this level we define the “video sequence viewing” as a behavior unit. The second type of log tracks the transitions between each video sequence viewing. This part regroups requests, results and successive viewed sequences. At this higher level, like in web mining we introduce a “session” like a behavior unit.

This paper is organized as follows. In section 2, we present the related work in video usage mining. Section 3 presents our two level model based approach for modeling users’ behaviors on a video search engine. Finally, section 4 describes the evaluation of the technique on some test datasets, and section 5 gives the conclusion and some future directions.

Related work

In the absence of any prior survey of video usage mining, closest related work can be classified into roughly two types: The first one concerns the analysis of user behaviors without considering the video content. These works report on statistics of user behavior and frequency counts of video access.

For example analyzes the student usage of an educational multimedia system. This analysis is based on the student personality types. Indeed, the learning needs and expectations depend on the characteristics of the student type personality. To achieve this, the authors developed a program that extracts the student actions on the multimedia system and profiles what each user did each time he uses the system. These user profiles include the following statistics: number of video viewing sessions, total seconds spent viewing videos ; number of video viewing sessions that lasted more than 20 minutes, average duration of a video viewing session, average number commands per minute during video viewing sessions, forward transitions, backward transitions, forward jumps and jump ratio. While being based on the statistics collected on each type of students, they analyze how the learning multimedia system can be improved to remedy its shortcomings.

An analysis of trace data obtained from user access on videos on the web is presented in. They examine properties such as how user requests vary on a day to day, and whether video accesses exhibit any temporal properties. They propose to benefit from these properties to design the multimedia systems such as web video proxy caches, and video servers. For example the analysis revealed that users preview the initial portion of a video to find out if they are interested. If they like it, they continue watching, otherwise they stop it. This pattern suggests that caching the first several minutes of video data should improve access performance.

The other type of work relates to the behavior analysis on a single video.

A framework that combines video content analysis and user log mining to generate a video summary is presented in. They develop a video browsing and summarization system that is based on previous viewers browsing log to facilitate future viewers. They adopt the link analysis technique used in web mining, and propose a concept of ShotRank that measures the importance of each video shot, the user behavior is simulated with an Interest-guided Walk model, and the probability of a shot being visited is taken as an indication of the importance of that shot. The resulting ShotRank is used to organize the presentation of video shots and generate video skims.

The lack in the previous work is to correlate global behavior of the users with their behavior on each of the videos. They do not take into account actions done during a video viewing while considering navigation between video sequences. In short, these works are rather distant from our context. The navigation and research concepts in a large video data base are missing. Moreover, there are neither standards nor benchmarks on video log data.

Proposed approach


One of the needs of the professional users of Companies of audio-visual sector is to be able to find existing video sequences in order to re-use them in the creation of new films. Our approach is based on the use of a well suited video search engine. Our tool is a classical browser for finding video in huge databases. Researches are executed on content-based indexing. Much hidden information can be extracted from the usage and used to improve the meaningful of the videos returned by the search engine.

To achieve this task, we first need to define what a usage of a video search engine is. Such a behavior can be divided into three parts. – 1° Request creation: the user defines its search attributes and values. – 2° Result set exploitation: founded sequences are presented to the user. They are ordered by an attribute-based confidence value. – 3° Selected sequences viewing: the user is viewing sequences he is interested in. This viewing is achieved with a video browser offering usual functions (play, pause, forward, rewind, stop, jump).

Groups of viewed sequences form sessions. They correspond to a visit of a user. They are composed of several searches and video sequences viewing.

Gathering data

All of these data are collected and written into log files. In order to create these files, we define a XML-based language. A session is gathered as followed. The first part contains the request executed and the list of video sequences returned. The second one logs the viewing of sequences.

Like web logfile, our video logfile traces the actions of users. To extract sessions, we have developed an XSLT (extensible Stylesheet Language Transformation) converter. This converter extracts and regroups sessions from this logfile in XML format. The following part of the paper explains how we propose to model a video session.

Modeling user’s behavior: a two-level-based model

From log data gathered previously, we generate two models to represent user’s behavior. The first one refers to the way a user views a video sequence (play, pause, forward.). At this level we define the “video sequence viewing” as a behavior unit. The second one tracks the transitions between each video sequence viewing. This part regroups requests, results and successive viewed sequences. At this higher level, we introduce a “session” like a behavior unit. Presently our work is only based on sequences. We do not take into account the information given by the requests. This will be further upgraded.

A session is a list of viewed video sequences. The particularity and the interest of the video log data will be the ability to define importance of each sequence in each session. More than a simple weight, comparable to time comparison in web mining, we will here characterize several behavior types (complete viewing, overview, open-close, precise scene viewing). Based on these behaviors, viewing will be precisely defined and then we will be able to know which the use of a video was in a session.

Modeling and clustering intra video user’s behavior

An intra video user’s behavior is modeled by a first order non-hidden Markovian model . This model represents the probability to execute an action each second of the viewing of a video. Each vertex represents one of the actions proposed to the user while a viewing. For the first version of our tool, these actions are play, pause, forward, rewind, jump, and stop. For example, the edge from play to pause means that when a user is playing a video, there is a probability of 8% that he executes a pause the next second. Its limited complexity will allow us to propose an effective clustering method of these behaviors.

We will here introduce the K-Models clustering algorithm. This technique is almost an adaptation of the well known K-Means to the use of models instead of means. We try to find K clusters in a set of viewing actions (list of the actions performed by user while viewing a video sequence) by partitioning space. Each cluster is represented by one of the models described below. The difference stays in use of probability instead of distance to associate viewings to clusters. We calculate the probability that a viewing has been generated by models. We then associate the viewing to the cluster with the highest probability.

Based on these discovered models, we create a vector of behavior for each viewing. This vector corresponds to the probabilities that the viewing has been generated by each model.

Modeling and clustering inter video user’s behavior

From the initial dataset and the vector created with the intra video clustering, we construct a sequential representation of the sessions. A session is a time-ordered sequence of the viewed video. Each viewing is characterized by a couple of the unique identifier of the video and the vector of behavior connected to it. Based on this representation of sessions, we have developed a clustering algorithm that fills the following requirements: – any element belonging to a cluster has a common part with any other of the cluster. – The generality level of produced clusters relies on the definition of some parameters given by the user. These ones are presented next.

These requirements lead us to define the representation of a cluster this way: a cluster c is represented by a set of S sessions S c of minimal length l . A session s is attributed to a cluster if it matches at least p of the S sessions. The session s matches S c if S c is a subsequence extracted from s. S c is a subsequence of s means:

This way, we ensure the homogeneity of clusters and the fact that there is a common factor between any elements of a cluster. Hence, we avoid obtaining clusters composed of fully different elements, connected by a chain of next neighbors generally produced by distance-based clustering techniques. Parameters S, l p introduced previously are given by the analyst to allow him retrieving cluster of the required homogeneity.

The clustering algorithm itself is a classical hierarchical clustering algorithm. It starts with considering little groups of sessions as clusters and iteratively merges the two nearest clusters. The algorithm ends when the required level of homogeneity has been reached.


We can point the two following points from the first experimentations. 1° regrouped sessions are homogeneous. Particularly, we are able to avoid wrong fusion. Indeed classical distance-based clustering is able to associate two completely different sessions to the same cluster. More than groups of sessions, we produce profiles that are well representing sessions which have generated them, as illustrated in Figure 4.

Figure 4 presents two generated profiles of visit. Each one is composed of sequences of video identifiers. They represent the different video that a user has viewed during a visit. We can point that the first sequences of the two profiles are quite similar. The data are not regrouped because they have a different intra-video behavior. The introduction of the intra-video behavior and its analysis in the complete process of session clustering give more precise results. We are able to differentiate groups of sessions that have explored same videos in a different manner.

Figure 5 shows that the cost of taking into account intra-video behavior by using Markovian models does not penalize too much execution time. Even if execution is longer when using them, the evolution is quite similar.

Future work and research

In this paper, we propose a two level model based approach for modeling user behaviors on a video search engine. The first level aims at modeling and clustering user behavior on a single video sequence (intra video behavior), the second one aims at modeling and clustering user behavior on a set of video sequences (inter video behavior). Based on this representation we have developed a two phase clustering algorithm that fits these data. First results obtained from test dataset show that taking into account intra-video behavior to cluster inter-video behavior produces more meaningful results.

The main remaining work is to validate our technique on real datasets in a context of movie production. We have to ensure that results are still interesting on huge video database explored by many users.

Another objective is to model clusters using frequent item sets instead of subsequence and compare results to measure the impact of the sequential representation.

Video Watermarking - Benchmarking of [next] [back] Video Transcoding - Reduced Complexity Transcoding, Receiver Aware Transcoding, Quality Vs Complexity

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

over 2 years ago












































































































Vote down Vote up

over 4 years ago

Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can become great. Festivals in California 2013