Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from A-E

Analyzing Person Information in News Video - Introduction, Conclusions and Future Directions

name videos association recognition

Shin’ichi Satoh
National Institute of Informatics, Tokyo, Japan

Definition: Analyzing person information in news video includes the identification of various attributes of a person, such as face detection and recognition, face-name association, and others.


Person information analysis for news videos, including face detection and recognition, face-name association, etc., has attracted many researchers in the video indexing field. One reason for this is the importance of person information. In our social interactions, we use face as symbolic information to identify each other. This strengthens the importance of face among many types of visual information, and thus face image processing has been intensively studied for decades by image processing and computer vision researchers. As an outcome, robust face detection and recognition techniques have been proposed. Therefore, face information in news videos is rather more easily accessible compared to the other types of visual information.

In addition, especially in news, person information is the most important; for instance, “ who said this?”, “ who went there?”, “ who did this?”, etc., could be the major information which news provides. Among all such types of person information, “ who is this?” information, i.e., face-name association, is the most basic as well as the most important information. Despite its basic nature, face-name association is not an easy task for computers; in some cases, it requires in-depth semantic analysis of videos, which is never achieved yet even by the most advanced technologies. This is another reason why face-name association still attracts many researchers: face-name association is a good touchstone of video analysis technologies.

This article describes about face-name association in news videos. In doing this, we take one of the earliest attempts as an example: Name-It. We briefly describe its mechanism. Then we compare it with corpus-based natural language processing and information retrieval techniques, and show the effectiveness of corpus-based video analysis.

Conclusions and Future Directions

This article describes about face-name association in videos, especially Name-It, in order to demonstrate the effectiveness of corpus-based video analysis. There are potential directions to enhance and extend corpus-based face-name association. One possible direction is to elaborate component technologies such as name extraction, face extraction, and face matching. Recent advanced information extraction and natural language processing techniques enable almost perfect name extraction from text. In addition, they can provide further information such as roles of names in sentences and documents, which surely enhances the face-name association performance.

Advanced image processing or computer vision techniques will enhance the quality of symbolization of faces in video corpus. Robust face detection and tracking in videos is still challenging task (such as . In a comprehensive survey of face detection is presented). Robust and accurate face matching will rectify the occurrence patterns of faces (Figure 4), which enhances face-name association. Many research efforts have been made in face recognition, especially for surveillance and biometrics. Face recognition for videos could be the next frontier. In a comprehensive survey for face recognition is presented. In addition to face detection and recognition, behavior analysis is also helpful, especially to associate the behavior with person’s activity described in text.

Usage of the other modalities is also promising. In addition to images, closed-caption text, and video captions, speaker identification provides a powerful cue for face-name association for monologue shots.

In integrating face and name detection results, Name-It uses co-occurrence, which is based on coincidence. However, as mentioned before, since news videos are concise and easy to understand for people, relationship between corresponding faces and names is not so simple as coincidence, but may yield a kind of video grammar. In order to handle this, the system ultimately needs to “understand” videos as people do. In an attempt to model this relationship as temporal probability distribution is presented. In order to enhance the integration, we need much elaborated video grammar, which intelligently integrate text processing results and image processing results.

It could be beneficial if corpus-based video analysis approach is applied to general objects in addition to faces. However, obviously it is not feasible to realize detection and recognition of many types of objects. Instead, in one of the promising approaches is presented. The method extracts interest points from videos, and then visual features are calculated for each point. These points are then clustered by features into “words,” and then a text retrieval technique is applied for object retrieval for videos. By this, the method symbolizes objects shown in videos as “words,” which could be useful to extend corpus-based video analysis to general objects.

Anastasis [next] [back] An Industry and an Art

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or