Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from F-J

Face Recognition - Face Detection, Global Approaches for, Feature Based Techniques, Problems and Considerations, Conclusions and Future Developments

images image features facial

Filareti Tsalakanidou , Sotiris Malassiotis 2 and Michael G. Strintzis
Aristotle University of Thessaloniki, Thessaloniki, Greece
Informatics and Telematics Institute, Centre for Research and
Technology Hellas, Thessaloniki, Greece

Definition: A face recognition system recognizes an individual by matching the input image against images of all users in a database and finding the best match.

Face recognition has received significant attention in the last 15 years, due to the increasing number of commercial and law enforcement applications requiring reliable personal authentication (e.g. access control, surveillance of people in public places, security of transactions, mug shot matching, and human-computer interaction) and the availability of low-cost recording devices.

Despite the fact that there are more reliable biometric recognition techniques such as fingerprint and iris recognition, these techniques are intrusive and their success depends highly on user cooperation, since the user must position her eye in front of the iris scanner or put her finger in the fingerprint device. On the other hand, face recognition is non-intrusive since it is based on images recorded by a distant camera, and can be very effective even if the user is not aware of the existence of the face recognition system. The human face is undoubtedly the most common characteristic used by humans to recognize other people and this is why personal identification based on facial images is considered the friendliest among all biometrics.

Depending on the application, a face recognition system can be working either on identification or verification mode. In a face identification application, the system recognizes an individual by matching the input image against images of all users in a database and finding the best match. In a face verification application the user claims an identity and the system accepts or rejects her claim by matching the input image against the image that corresponds to this specific identity, which can be stored either in a database or an identification card (e.g. smart card). In other words, face identification is a one-to-many comparison that answers the question “Who is the person in the input image? Is she someone in the database?”, while face verification is a one-to-one comparison that answers the question “Is the person in the input image who she claims to be?” In the sequel the term face recognition will be used for both identification and verification unless a distinction needs to be made (see Figure 1).

Image matching usually involves three steps: 1. detection of the face in a complex background and localization of its exact position, 2. extraction of facial features such as eyes, nose, etc, followed by normalization to align the face with the stored face images, and 3. face classification or matching.

In addition, a face recognition system usually consists of the following four modules:

  1. Sensor module, which captures face images of an individual. Depending on the sensor modality, the acquisition device maybe a black and white or color camera, a 3D sensor capturing range (depth) data, or an infrared camera capturing infrared images.
  2. Face detection and feature extraction module. The acquired face images are first scanned to detect the presence of faces and find their exact location and size. The output of face detection is an image window containing only the face area. Irrelevant information, such as background, hair, neck and shoulders, ears, etc are discarded. The resulting face image is then further processed to extract a set of salient or discriminatory, local or global features, which will be used by the face classifier to identify or verify the identity of an unknown face. Such features maybe the measurements of local facial features (such as eyes, nose, mouth, etc) characteristics or global features such as transformation coefficients of global image decomposition (PCA, LDA, wavelets, etc). These features constitute the template or signature uniquely associated with the image.
  3. Classification module, in which the template extracted during step 2 is compared against the stored templates in the database to generate matching scores, which reveal how identical the faces in the probe and gallery images are. Then, a decision-making module either confirms (verification) or establishes (identification) the user’s identity based on the matching score. In case of face verification, the matching score is compared to a predefined threshold and based on the result of this comparison, the user is either accepted or rejected. In case of face identification, a set of matching scores between the extracted template and the templates of enrolled users is calculated. If the template of user X produces the best score, then the unknown face is more similar to X, than any other person in the database. To ensure that the unknown face is actually X and not an impostor, the matching score is compared to a predefined threshold.
  4. System database module, which is used to extract and store the templates of enrolled users. This module is also responsible for enrolling users in the face recognition system database. During the enrolment of an individual, the sensor module records images of her face. These images are called gallery images and they are used for training the classifier that will perform face recognition. Most commonly, several frontal neutral views of an individual are recorded, but often face images depicting different facial expressions (neutral, smile, laugh, anger, etc) and presence (or non-) of glasses are also acquired. Sometimes gallery images are recorded in more than one session. The time interval between different sessions may result in variations due to hairstyle, beard, make-up, etc being present in gallery images. The presence of such variations ensures a more robust face recognition performance. Given a user’s set of acquired images, a set of features is extracted similarly to step 3 above, and a template that provides a compact and expressive representation of the user based on her images is generated. This is called training. The training algorithm depends on the face recognition method employed by the face recognition system. The aim of the training is to encode the most discriminative characteristics of a user based on the classifier chosen, and to determine the values of the different thresholds. Sometimes, more than one template per enrolled user is stored in the gallery database to account for different variations. Templates may also be updated over time, mainly to cope with variations due to aging.

The different steps of face recognition and a brief description of the most representative face detection and face recognition techniques are presented in the following.

Face Detection

Face detection is the first stage of an automatic face recognition system, since a face has to be located in the input image before it is recognized. A definition of face detection could be: given an image, detect all faces in it (if any) and locate their exact positions and size. Usually, face detection is a two-step procedure: first the whole image is examined to find regions that are identified as “face”. After the rough position and size of a face are estimated, a localization procedure follows which provides a more accurate estimation of the exact position and scale of the face. So while face detection is most concerned with roughly finding all the faces in large, complex images, which include many faces and much clutter, localization emphasizes spatial accuracy, usually achieved by accurate detection of facial features.

Face detection algorithms can be divided into four categories according to:

  1. Knowledge-based methods are based on human knowledge of the typical human face geometry and facial features arrangement. Taking advantage of natural face symmetry and the natural top-to-bottom and left-to-right order in which features appear in the human face, these methods find rules to describe the shape, size, texture and other characteristics of facial features (such as eyes, nose, chin, eyebrows) and relationships between them (relative positions and distances). A hierarchical approach may be used, which examines the face at different resolution levels. At higher levels, possible face candidates are found using a rough description of face geometry. At lower levels, facial features are extracted and an image region is identified as face or non-face based on predefined rules about facial characteristics and their arrangement. The main issue in such techniques is to find a successful way to translate human knowledge about face geometry into meaningful and well-defined rules. Another problem of such techniques is that they do not work very well under varying pose or head orientations.
  2. Feature invariant approaches aim to find structural features that exist even when the viewpoint or lighting conditions vary and then use these to locate faces. Different structural features are being used: facial local features, texture, and shape and skin color. Local features such as eyes, eyebrows, nose, and mouth are extracted using multi-resolution or derivative filters, edge detectors, morphological operations or thresholding. Statistical models are then built to describe their relationships and verify the existence of a face. Neural networks, graph matching, and decision trees were also proposed to verify face candidates. Skin color is another powerful cue for detection, because color scene segmentation is computationally fast, while being robust to changes in viewpoint, scale, shading, to partial occlusion and complex backgrounds. The color-based approach labels each pixel according to its similarity to skin color, and subsequently labels each sub-region as a face if it contains a large blob of skin color pixels . It is sensitive to illumination, existence of skin color regions, occlusion, and adjacent faces. There are also techniques that combine several features to improve the detection accuracy. Usually, they use features such as texture, shape and skin color to find face candidates and then use local facial features such as eyes, nose and mouth to verify the existence of a face. Feature invariant approaches can be problematic if image features are severely corrupted or deformed due to illumination, noise, and occlusion .
  3. Template-based methods. To detect a face in a new image, first the head outline, which is fairly consistently roughly elliptical is detected using filters, edge detectors, or silhouettes. Then the contours of local facial features are extracted in the same way, exploiting knowledge of face and feature geometry. Finally, the correlation between features extracted from the input image and predefined stored templates of face and facial features is computed to determine whether there is face present in the image. Template matching methods based on predefined templates are sensitive to scale, shape and pose variations. To cope with such variations, deformable template methods have been proposed, which model face geometry using elastic models that are allowed to translate, scale and rotate. Model parameters may include not only shape, but intensity information of facial features as well.
  4. Appearance-based methods. While template-matching methods rely on a predefined template or model, appearance-based methods use large numbers of examples (images of faces and \ or facial features) depicting different variations (face shape, skin color, eye color, open\closed mouth, etc). Face detection can be viewed as a pattern classification problem with two classes: “face” and “non-face”. The “non-face” class contains images that may depict anything that is not a face, while the “face” class contains all face images. Statistical analysis and machine learning techniques are employed to discover the statistical properties or probability distribution function of the pixel brightness patterns of images belonging in the two classes. To detect a face in an input image, the whole image is scanned and image regions are identified as “face” or “non face” based on these probability functions. Well-known appearance-based methods used for face detection are eigenfaces , LDA , neural networks, support vector machines and hidden Markov models.

Face Recognition

Face recognition techniques can be roughly divided into two main categories: global approaches and feature based techniques. In global approaches the whole image serves as a feature vector, while in local feature approaches a number of fiducial or control points are extracted and used for classification.

Global Approaches for Face Recognition

Global approaches model the variability of the face by analyzing its statistical properties based on a large set of training images. Representative global techniques are eigenfaces, Linear Discriminant Analysis (LDA), Support Vector Machines (SVM) and neural networks.

The first really successful face recognition method (and a reference point in face recognition literature) is a holistic approach based on principal component analysis (PCA) applied on a set of images in order to extract a set of eigen-images, known as eigenfaces. Every face is modeled as a linear combination of a small subset of these eigenfaces and the weights of this representation are used for recognition. The identification of a test image is done by locating the image in the database, whose weights are the closest to the weights of the test image. The concept of eigenfaces can be extended to eigenfeatures, such as eigeneyes, eigenmouth, etc.

Using a probabilistic measure of similarity instead of the Euclidean distance between weights, the eigenface approach was extended to a Bayesian approach based on image differences Face recognition is viewed as a two-class classification problem. The first class contains intensity differences between images of the same individual (depicting variations in expression, illumination, head orientation, use of cosmetics, etc) and represents the intrapersonal facial variations. The second class contains intensity differences between images belonging to different people and represents the extrapersonal facial variations due to differences in identity. The distribution probabilities of the two excluding classes are estimated using a large training set. The MAP (Maximum a Posteriori) rule is used for face recognition.

Face recognition techniques using Linear/Fisher Discriminant Analysis (LDA)were also developed. LDA determines a subspace in which the between-class scatter (extrapersonal variability) is as large as possible, while the within-class scatter (intrapersonal variability) is kept constant. In this sense, the subspace obtained by LDA optimally discriminates the classes- faces. A combination of PCA and LDA was also proposed. Other global techniques include Support Vector Machines (SVM) and neural networks (NN).

Feature Based Face Recognition Techniques

The main idea behind feature-based techniques is to discriminate among different faces based on measurements of structural attributes of the face. Most recent approaches are the Embedded Hidden Markov Models (EHMMs), the Elastic Graph Matching and Dynamic Link Architecture.

For frontal views the significant facial features appear in a natural order from top to bottom (forehead, eyes, nose, and mouth) and from left to right (e.g. left eye, right eye). EHMMs model the face as a sequence of states roughly corresponding to facial features regions. The probability distribution functions of EHMM states are approximated using observations extracted by scanning training images from left-to-right and top-to-bottom. To verify a face, first the observations are extracted from the input image and then their probability given the stored EHM model is calculated.

One of the most successful feature-based techniques is the Elastic Bunch Graph Matching , which is based on the Dynamic Link Architecture (DLA). The basic idea of EGM is the representation of the face using a set of local image features extracted from the intensity images over fiducial image points and the exploitation of their spatial coherence using a connected graph. Each node in the graph is assigned a set of Gabor wavelet coefficients, over different scales and orientations, extracted from the image function. The graph is adapted to each face in the face database by the minimization of a cost function that locally deforms the graph.

Approaches that use both global and local features have also been proposed. For example, the modular eigenspace approach uses both eigenfaces and eigenfeatures, while the Local Feature Analysis (LFA) extracts topographic local features from the global PCA modes and uses them for recognition.

Problems and Considerations

Automatic face recognition is a particularly complex task that involves detection and location of faces in a cluttered background followed by normalization and recognition. The human face is a very challenging pattern to detect and recognize, because while its anatomy is rigid enough so that all faces have the same structure, at the same time there are a lot of environmental and personal factors affecting facial appearance. The main problem of face recognition is large variability of the recorded images due to pose, illumination conditions, facial expressions, use of cosmetics, different hairstyle, presence of glasses, beard, etc. Images of the same individual taken at different times, may sometimes exhibit more variability due to the aforementioned factors (intrapersonal variability), than images of different individuals due to gender, race, age and individual variations (extrapersonal variability). One way of coping with intrapersonal variations is including in the training set images with such variations. And while this is a good practice for variations such as facial expressions, use of cosmetics and presence of glasses or beard, it may not be successful in case of illumination or pose variations. Another crucial parameter in face recognition is aging. A robust recognition system should be able to recognize an individual even after some years, especially in mug-shot matching forensic applications. This is a very challenging task, which has not been successfully addressed yet.

Recent public facial recognition benchmarks have shown that in general, the identification performance decreases linearly in the logarithm of number of people in the gallery database . Also, in a demographic point of view, it was found that the recognition rates for males were higher than for females, and that the recognition rates for older people were higher than for younger people. These tests also revealed that while the best recognition techniques were successful on large face databases recorded in well-controlled environments, their performance was seriously deteriorated in uncontrolled environments, mainly due to variations in illumination and head rotations. Such variations have proven to be one of the biggest problems of face recognition systems.

Several techniques have been proposed to recognize faces under varying pose. One approach is the automatic generation of novel views resembling the pose in the probe image. This is achieved either by using a face model (an active appearance model (AAM) or a deformable 3D model or by warping frontal images using the estimated optical flow between probe and gallery. Classification is subsequently based on the similarity between the probe image and the generated view. A different approach is based on building a pose varying eigenspace by recording several images of each person under varying pose. Representative techniques are the view-based subspace and the predictive characterized subspace. More recently, techniques that rely on 3D shape data have been proposed.

The problem of coping with illumination variations is increasingly appreciated by the scientific community and several techniques have been proposed that may be roughly classified into two main categories. The first category contains techniques seeking illumination insensitive representations of face images. Several representations were seen to be relatively insensitive to illumination variability, e.g. the direction of the image gradient or the sum of gradient of ratios between probe and gallery images.

The second approach relies on the development of generative appearance models, able to reconstruct novel gallery images resembling the illumination in the probe images. Some of these techniques utilize a large number of example images of the same person under different illumination conditions to reconstruct novel images. Other approaches utilize a 3D range image and albedo map of the person’s face to render novel images under arbitrary illumination, while others are based on a combination of the above. Finally, a third more recent approach is based on computer graphics techniques for relighting the probe image so that it resembles the illumination in gallery images.

Conclusions and Future Developments

The problem of machine face recognition has been an ongoing subject of research for more than 20 years. Although a large number of approaches have been proposed in the literature and have been implemented successfully for real-world applications, robust face recognition is still a challenging subject, mainly because of large facial variability, pose variations and uncontrolled environmental conditions. The use of novel sensors, such as 3D, can help overcome limitations due to viewpoint and lighting variations. On the other hand, it has been acknowledged that there is no perfect biometric and thus the combination of different modalities, e.g. face combined with speaker, fingerprint and/or hand recognition, is required to achieve the desired level of performance.


User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

over 6 years ago

give defination on face not history

Vote down Vote up

about 5 years ago

ituPoker.Com Agen Poker Online Indonesia Terpercaya