Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from A-E

Content-Based Photo Album Management Using Faces - Abstract, Introduction, The digital photo album, Building composite images to represent the folders

photos event system features

Mohamed Abdel-Mottaleb and Longbin Chen
University of Miami, ECE Department, Coral Gables, Florida, USA

Definition: The system automatically detects and stores information about the locations of faces in the photos.


Photo album management, an application of content-based image browsing/retrieval, has attracted attention in recent years. Identities of individuals appearing in the photos are the most important aspect for photo browsing. However, face recognition generally does not work effectively in this case due to the large variations in pose, illumination and sometimes poor quality of images. In this paper, we present a system for browsing photo albums. A similarity function based on face arrangement is defined. Photos are then clustered based on the similarity function using a clustering algorithm proposed in this paper. The system also represents photos of an event using a composite image. The composite image is built from representative faces and an image that represents the event. Experiments indicate that the face arrangement features are effective in representing the semantic content of the photos and are appropriate for photo albums.


Various systems were developed for content-based image retrieval, among them are QBIC, CONIVAS Visualseek, and Netra. These systems represent image content using a set of low-level attributes, such as color, texture, shape, layout, and motion. Retrieval is performed by comparing features of a query image with corresponding features of images stored in a database and presenting the user with the images in the database that have the most similar features. However, low-level features generally could not properly represent image content. Limiting the application domain enables the use of domain knowledge in building sophisticated tools for the specific application. Recently, there has been some work on family photo album management. In case of photo albums, there are some metadata that are readily available, such as date-time. In addition, faces of individuals play an important role in case of family photos, and can be used in building semantic photo album browsing and retrieval tools.

Users usually remember photos by date, scene type and the individuals in the photos. Date and other metadata could be obtained from the photos. Therefore, to automate photo album management, the “who is in the photo” needs to be automatically obtained. However, automatic face recognition usually does not work well in this case because of the large variations in pose, illumination and sometimes poor photo qualities. Therefore, other individual-related features, instead of identity, should be used for representation. In, a semi-automatic face annotation mechanism is proposed. “Cloth features” are used for photos of the same scene or event as a replacement for facial features to identify individuals appearing in multiple photos. In, the idea is extended by using a Bayesian framework to identify people from image to image in photo albums, where features such as image date and other face features are used.

In this paper, we present a system for image browsing. The system represents the photos in each event by a synthesized composite image. The composite image is built from an image that represents the event as well as faces of people who appear in the event. In fact, the individuals involved in an event could be a very discriminating feature to recognize the event. If combined with the representative image that shows the background, the user can easily recognize the event. The system also clusters the images within a single event based on face arrangement, which includes the locations of the faces, their sizes and identities. The goal of this clustering is to allow for the user to browse through the photos of the event.

The system described in this article has the following features:

1) Automatic Face Detection with probabilistic output;

2) An effective similarity measurement function based on face arrangement;

3) Event representation using composite images that include faces which appear in the event;

4) Clustering based on the similarity metric to facilitate browsing; each cluster contains photos that could be displayed in one screen.

The digital photo album

During the archiving phase, a fast and robust face detector, based on the algorithms in, is applied to the images. This face detector could a) detect faces with large yaw-rotation and in-plane rotation b) produce face detection results with probabilistic measures that indicate the confidence in the detection accuracy. The system allows the user to modify the results of the face detection if there are any false positives or false negatives, which are usually rare. The face detection results are then archived in the database. The results are also used to create composite images that represent the folders of images in the database. These composite images are used for browsing. In this paper, we assume that each folder contains images from the same event.

Figure 1 shows the browsing interface of our system. The interface is composed of three panels. The left panel is the folder browsing section, where folders are listed using the folder representative images. The middle panel is the image browsing section, where all images in one folder are listed by cluster. The photos in the same folder are clustered based on face arrangement similarity, so that each cluster contains exactly the number of photos that can be shown in one screen. Using the scroll bar, the user can browse photos of different clusters in the folder. These clusters are sorted by average number of faces. The right panel is where images could be viewed in more detail.

Building composite images to represent the folders

Users tend to remember an event by WHEN and WHERE it happened and WHO participated in that event. The Date and Time of an event could be extracted from digital photos’ metadata and represented as text. The place of the event could be reasonably represented by a photo of the scene. However, it is not easy to automatically select such a representative photo for the event.

In this paper, we propose to build a composite image to represent the photos in a folder. The composite image is built from an image from the folder that has the largest number of important faces as well as a set of representative faces from the folder (Figure 2). This criterion is natural and usually effective, especially when the user is searching for photos of someone. If the photo’s orientation is landscape, the left frame in Figure 2 is used; otherwise, the right one is used.

The representative faces should be selected according to the following criteria. The faces must be the important ones in that folder in the sense that they should carry as much information as possible because the number of representative faces is limited to a certain number (in our system, the number is four). The choice of important faces depends on the definition of “important”. In fact, “important” is a semantic definition which has to be measured quantitatively. In our system, three factors are considered together to evaluate the importance of every face: the size of the face, the position of the face in the photo and the confidence of face detection result. Since faces are detected from all the images in a folder, the face of a person is usually detected multiple times. Therefore, we need to identify these cases in order to represent the folder with faces of different people.

It is usually the case that each individual wears the same cloth in a single event; therefore the color and texture features of the cloth can be an effective feature to roughly identify individuals. The faces of these individuals are then used in creating a composite image that represents the event. Figure 3 is the algorithm for generating a representative image for a folder. In order to extract the texture feature of the cloth, we used the auto-correlogram. It provides good features to describe the texture in color images. Compared with other color texture features, it 1) describes the spatial correlation of colors, 2) easy to compute and 3) the feature size is relatively small. The correlogram features are built using matrices similar to co-occurrence matrices used for describing texture. The difference is that the matrices here are built based on the absolute distance between the pixel values without considering the direction.

Browsing folders by Face Arrangement

We describe our measure of distance /similarity between images based on face arrangement. This measure is used in clustering the images of a folder for browsing as explained later in the article.

Similarity Function Based on Face Arrangements

There are several approaches for representing spatial relations between objects in images. In this paper our goal is to establish similarity between images based on the number of faces, their sizes and locations. Each face is represented by its bounding rectangle. To calculate the distance between two images, correspondence between the faces in the images has to be established as shown in Fig. 5. The distance between two images is defined as the weighted sum of four measures, T N , T D , T A , T ov :

T D measures the relative spatial location:

T A measures the average ratio of the area of corresponding faces:

T ov measures the average of the ratios of overlapped areas between corresponding faces:

T N is a measure that captures the difference in the number of faces between the query and a test image:

where I represents an image in the database and Q represents the query image, I N and Q N are the numbers of faces in images I and Q, M is min( I N , Q N ), normalized). In the experiments we used a = 0.2, ß = 0.1 and ? = 0.1 and the similarity is obtained by the following equation:

Correspondence between objects (Figure 4) is accomplished by minimizing T D for every object in one image with all other objects in the other image.

Clustering the Photo Images According to Face Arrangement

When the number of photos in a single folder is larger than the maximum number of images that can be displayed in one screen, the photos should be arranged in some order to facilitate browsing. This is achieved by clustering the photos based on the arrangement of the faces such that each cluster contains photos that could be displayed in one screen. Clustering images in a database was used before, however, in this article the clustering is based on the face arrangement similarity. Each cluster contains a set of images that can be displayed in one screen.

Our distance/similarity measure of face arrangement is a non-metric measure. Therefore, general metric-based clustering algorithms (e.g. k-means), could not be applied in our system. In order to cluster the photos in a folder based on this measure we propose the clustering algorithm in Figure 5.

After the clusters of photos are built, they are sorted by the average number of faces in each cluster so that they can be browsed in this order.

Representative Images of Folder

After the faces are detected, the system extracts the body patches from the photos. Using the correlogram features, these features for the body patches are clustered (see Figure 6). Although patches in Figure 6.a and Figure 6.d are from the same individual, she changed cloth and clustering produces two clusters for that person. Although this is not desired, it does not affect the system too much.

Ideally, the selected sub-representative faces should be for as many different individuals as possible. This is achieved by assuming that the individuals would wear the same cloth during the time when all these photos are taken. This happens quite often, although it is not always true as in Figure 6. In our experiments, we used folders of photos that are taken in the same daty and the individuals did not change their cloth.

Figure 7 shows two representative folder images generated by our system. The results are compared with the “Thumbnail-views” used in Windows XP explorer . The first folder contains 78 photos from a party. Our algorithm selected a representative photo that has the largest number of faces and selected four faces as sub-photos (see algorithm in Figure 3) to build the composite image, while the Windows explorer selected some photos that have furniture and are not good representatives for the event. The second folder contains photos from a trip that contains an outdoor event with gathering in a cabin. Our algorithm selected a representative photo that contains faces of most individuals in that event and attached to it four different faces from the event.

Face Arrangement Similarity and Clustering

We conducted a subjective evaluation of the face arrangement similarity function described in a previous section. The database contained about 3500 images that were archived based on the locations and sizes of the faces. Queries were either images or sketches that contain bounding rectangles at the possible positions of the faces. The retrieved images were judged as either relevant or irrelevant by a subject who is not familiar with the system. Then, a performance measure was calculated. First we explain the performance measure and then present the results.

We used the performance measure used in , which is defined to reflect the rank position (Rp) of the relevant images:

where N is the number of images retrieved for browsing (in our experiment we looked at the top 50 images), N R is the number of the relevant images for the query, R i is the rank at which the i th relevant image is retrieved. This measure is 0 for perfect performance and approaches 1 as performance decreases.

The query images contained up to four faces in different positions. The average R p values are shown in Table 1 for queries with different number of faces. As shown in the table the average R p values are small, which indicate the good performance of the retrieval.

The average time to retrieve a photo by our system depends on the size of the photo albums. For a typical photo album containing 50 folders and each folder containing 256 photos, the number of photos that should be initially viewed is the 50 representative images of the folders. Then, the user could select the right photo clusters and then look for the right photos in the selected cluster.


We presented a system for archiving and browsing photos in a digital photo album. The system allows for browsing based on faces and their arrangements and represents events with a set of composite images. As was demonstrated in the experiments, the results are intuitive and make sense from user’s point of view. The archiving process is fully automated.

Content Distribution Network [next] [back] Content Based Music Retrieval - Music Formats, Retrieval tasks, Searching symbolic music, Searching musical audio, Feature extraction, Audio Fingerprinting, Concluding Remarks

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or