Complete Communications Engineering

The Viola-Jones algorithm is a widely used mechanism for object detection. The main property of this algorithm is that training is slow, but detection is fast. This algorithm uses Haar basis feature filters, so it does not use multiplications.

The efficiency of the Viola-Jones algorithm can be significantly increased by first generating the integral image.

face-detection-viola-jones-eq1The integral image allows integrals for the Haar extractors to be calculated by adding only four numbers. For example, the image integral of area ABCD (Fig.1) is calculated as II(yA,xA) – II(yB,xB) – II(yC,xC) + II(yD,xD).

viola-jones-image-integration-area
Fig.1 Image area integration using integral image

Detection happens inside a detection window.  A minimum and maximum window size is chosen, and for each size a sliding step size is chosen.  Then the detection window is moved across the image as follows:

  1. Set the minimum window size, and sliding step corresponding to that size.
  2. For the chosen window size, slide the window vertically and horizontally with the same step. At each step, a set of N face recognition filters is applied. If one filter gives a positive answer, the face is detected in the current widow.
  3. If the size of the window is the maximum size stop the procedure. Otherwise increase the size of the window and corresponding sliding step to the next chosen size and go to the step 2.

Each face recognition filter (from the set of N filters) contains a set of cascade-connected classifiers. Each classifier looks at a rectangular subset of the detection window and determines if it looks like a face. If it does, the next classifier is applied. If all classifiers give a positive answer, the filter gives a positive answer and the face is recognized. Otherwise the next filter in the set of N filters is run.

Each classifier is composed of Haar feature extractors (weak classifiers). Each Haar feature is the weighted sum of 2-D integrals of small rectangular areas attached to each other. The weights may take values ±1. Fig.2 shows examples of Haar features relative to the enclosing detection window. Gray areas have a positive weight and white areas have a negative weight. Haar feature extractors are scaled with respect to the detection window size.

face-detection-features-enclosing-window
Fig.2 Example rectangle features shown relative to the enclosing detection window

The classifier decision is defined as:

viola-jones-classifier-eq1

fm,i is the weighted sum of the 2-D integrals.  is the decision threshold for the i-th feature extractor.  αm,i and βm,i are constant values associated with the i-th feature extractor.  θm is the decision threshold for the m-th classifier.

object-detection-viola-jones-filter
Fig.3 Object detection Viola-Jones filter

The cascade architecture is very efficient because the classifiers with the fewest features are placed at the beginning of the cascade, minimizing the total required computation. The most popular algorithm for features training is AdaBoost.

More Information