Estimation of Human Head Pose using Similarity Measures
Jamie Sherrah, Eng Jon Ong and Shaogang Gong




3D human head pose is an important cue for scene interpretation and remote computer control.  To determine the pose of a head in an image, one must first determine the position of the head in the image.  However, detection of faces at frontal views is a challenging task, let alone detection at arbitrary views.  It may be possible to construct a pose-specific face detection system.  Therefore the pose estimation task is a chicken-egg problem: we must know where the face is to determine its pose, but we need to know the head pose to find the face.

A brute force approach would be to construct a face detection module for each possible pose, then scan exhaustively for each position, scale and pose.  There are two problems with this approach:

  1. computational expense; and
  2. lack of training data labelled with head pose angles.
Therefore any viable technique must be computationally inexpensive, and not rely too heavily on large amounts of training data.
 

Similarity Measures for Estimating Head Pose

To estimate head pose we employ Shimon Edelman's concept of second-order similarity.  We collect a database of face images taken at regular intervals across the pose sphere in yaw and tilt for Ndifferent people.



Given a novel face, a hypothesised head pose, and some measure of similarity between two face images, we can construct the N-dimensional vector of similarities of this face to database faces at the given pose.  The idea is that this vector is now a signature or feature vector for the novel face.  The norm of the vector can be used as a measure of similarity to faces at that pose (small norm means high similarity).

The similarity measure method meets the two criteria above.  It is fast since it requires N similarity evaluations to compute, and it does not attempt to model the data distributions, therefore lack of data is explicitly addressed.  The problems with using this method as described for pose estimation come from variations in lighting conditions and spatial alignment.  These difficulties could be assuaged using a more complicated similarity measure.  However, due to these problems the similarity measures were not sufficiently robust for direct application.  We examined two methods for pre-processing face images to improve the approach: orientation-selective filtering and principal component analysis (PCA).

Gabor filters are complex sinusoids modulated by a Gaussian envelope.
 

By pre-filtering the facial images using oriented Gabor filters, we found that at certain head poses, different filter orientations produce optimal discrimination.

We also performed PCA on images from different people in profile-to-profile views.  Re-projecting these faces back onto the first two principal components showed a clear pattern between these 2 PCs and head yaw.

The results of the experiment show that discrimination of faces at different poses using the similarity method can work, and results can be improved by pre-processing the images.
 

Tracking Head Pose, Face Position and Scale by Fusing Perceptual Cues

The overall aim of this work is to perform real-time pose estimation and face detection.  However, the similarity method is still too slow for exhaustive scanning for faces.  In this work, we exploit temporal continuity of scale, position and head pose by tracking all three quantities simultaneously.    The CONDENSATION algorithm is used for tracking.  However, without exploiting all available constraints the tracker loses lock immediately.  Therefore we fuse all available quantities:

  1. similarity measurements obtained from a database of faces taken across different people and poses;
  2. head position obtained in absolute terms using skin colour segmentation;

  3.  

     
     
     


     
     

  4. covariance between changes in head pose and changes in face position; and
  5. covariance between absolute head pose and absolute face position.

  6.  

     
     
     


     


Such a tracker can work successfully.  An example is shown in this MPEG (2.3 MB).


 

Relevant Publications:

Jamie Sherrah and Shaogang Gong, "Fusion of Perceptual Cues for Robust Tracking of Head Pose and Position",  Pattern Recognition, 2000 to appear in Special Issue on Data and Information Fusion in Image Processing and Computer Vision.

Jamie Sherrah and Shaogang Gong, "Fusion of 2D Face Alignment and 3D Head Pose Estimation for Robust and Real-Time Performance",  Proceedings of IEEE International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-Time Systems, 26-27 September 1999, Corfu, Greece.

Jamie Sherrah and Shaogang Gong, "Fusion of Perceptual Cues using Covariance Estimation",  Proceedings of BMVC'99, 13-16 September 1999, Nottingham, England.

Jamie Sherrah, Shaogang Gong, Eng Jon Ong "Understanding Pose Discrimination in Similarity Space", Proceedings of BMVC'99, 13-16 September 1999, Nottingham, England

Shaogang Gong, Eng-Jon Ong and Stephen McKenna, "Learning to Associate Faces across Views in Vector Space of Similarities to Prototypes",  Proceedings of BMVC'98, 14-17 September 1998, Southampton, England.

Shaogang Gong, Stephen McKenna and John J. Collins, "An Investigation into Face Pose Distributions",  Second International Conference on Automated Face and Gesture Recognition, Killington, Vermont, US, October 1996
 



Jamie Sherrah  8/6/2000