Libraries · Tips & Tricks · Uncategorized

Face Detection on Android and iOS

Android and iOS have built-in support for detecting faces in photos. We present a small platform-independent abstraction layer to access this functionality through a uniform interface.

You can find the source code on GitHub as part of our JustAddCode repository.

Like we did in our post on Cross Platform Text-to-Speech, we expose the functionality through a Delphi object interface. The implementation of this interface is platform-dependent. This is a common way to hide platform-specific details, and one of the ways we discussed in our post on Cross Platform Abstraction.

IgoFaceDetector API

The interface is super simple:

type
  IgoFaceDetector = interface
    function DetectFaces(const ABitmap: TBitmap): TArray<TgoFace>;
  end;

There is only one method. You provide it with a bitmap of a photo that may contain one or more faces, and it returns an array of TgoFace records for each detected face. This record is also pretty simple:

type
  TgoFace = record
    Bounds: TRectF;
    LeftEyePosition: TPointF;
    RightEyePosition: TPointF;
    EyesDistance: Single;
  end;

It contains a rectangle into the bitmap that surrounds the face, as well as the position in the bitmap of the two eyes and the distance between the eyes (in pixels). Sometimes, one or both eyes cannot be reliably detected. In that case, the positions will be (0, 0) and the distance between the eyes will be 0.

To create a face detector, call the TgoFaceDetector.Create factory function:

type
  TgoFaceDetector = class // static
  public
    class function Create(
      const AAccuracy: TgoFaceDetectionAccuracy = TgoFaceDetectionAccuracy.High;
      const AMaxFaces: Integer = 5): IgoFaceDetector; static;
  end;

The parameters are optional. The AAccuracy parameter lets you trade off face detection accuracy for speed. You can choose between Low and High accuracy, which will be faster and slower respectively. This parameter is only used for iOS. On Android, you cannot specify an accuracy. Also, you can specify the maximum number of faces to detect in a photo using the AMaxFaces parameter. Again, higher values increase detection time. If you want to do real-time face detection using the built-in camera, then you probably want to set this parameter to 1.

That’s all there is to it. Our GitHub repository has a little sample application that runs the face detector on a set of photographs. This is what the result looks like on iOS and Android:

FaceDetectionScreenShots

The face detection capabilities are different for both platforms. In general, the iOS version is able to detect more faces (or with higher accuracy) than the Android version.

Null Implementation

This post is about face detection on Android and iOS only since they have built-in support this. On other platforms, you can use libraries like OpenCV the create similar functionality, but that is not as straight-forward and outside the scope of this article.

However we don’t want you to add {$IFDEF‘s everywhere in your code to exclude the face detector on other platforms. So, there is also a null (or no-op) implementation of the IgoFaceDetector interface that does nothing and just returns an empty array of faces:

type
  TgoFaceDetectorImplementation = class(TInterfacedObject, IgoFaceDetector)
  protected
    { IgoFaceDetector }
    function DetectFaces(const ABitmap: TBitmap): TArray<TgoFace>;
  end;

function TgoFaceDetectorImplementation.DetectFaces(
  const ABitmap: TBitmap): TArray<TgoFace>;
begin
  Result := nil;
end;

This implementation is used for all platforms except iOS and Android.

iOS Implementation

On iOS, face detection functionality is available as part of the CoreImage framework. You need to create two objects: one of type CIContext and one of type CIDetector:

constructor TgoFaceDetectorImplementation.Create(
  const AAccuracy: TgoFaceDetectionAccuracy; const AMaxFaces: Integer);
var
  Options: NSDictionary;
  Value: Pointer;
begin
  inherited Create;
  FMaxFaces := AMaxFaces;

  FContext := TCIContext.Wrap(TCIContext.OCClass.contextWithOptions(nil));
  FContext.retain;

  if (AAccuracy = TgoFaceDetectionAccuracy.Low) then
    Value := CIDetectorAccuracyLow
  else
    Value := CIDetectorAccuracyHigh;
  Options := TNSDictionary.Wrap(TNSDictionary.OCClass.dictionaryWithObject(
    Value, CIDetectorAccuracy));

  FDetector := TCIDetector.Wrap(TCIDetector.OCClass.detectorOfType(
    CIDetectorTypeFace, FContext, Options));
  FDetector.retain;
end;

The context is created with default options using the contextWithOptions “class” function. Whenever you create an Objective-C object using one of these “class” functions (instead of using a constructor), then the returned object is a so-called auto-release object. This means that the object will be destroyed automatically at some point unless you retain a reference to it. So we need to call retain to keep the object alive, and call release once we no longer need the object (which is done inside the destructor).

A similar model is used to create the detector object. You pass a dictionary with detection options. In this case, the dictionary contains just a single option with the detection accuracy. You also need to specify the type of detector you require. In our example, we want a face detector, but you can also create a detector to detect text or QR codes for example.

The code to actually detect the faces is a bit more involved, but not too bad:

function TgoFaceDetectorImplementation.DetectFaces(
  const ABitmap: TBitmap): TArray<TgoFace>;
var
  ...
begin
  { Create a CIImage with the contents of our FireMonkey bitmap. }
  if (not ABitmap.Map(TMapAccess.Read, SrcData)) then
    Exit(nil);
  try
    Data := TNSData.Wrap(TNSData.OCClass.dataWithBytes(SrcData.Data,
      SrcData.Width * SrcData.Height * SrcData.BytesPerPixel));
  finally
    ABitmap.Unmap(SrcData);
  end;

  { Now we can create a CIImage using this data. }
  Size.width := ABitmap.Width;
  Size.height := ABitmap.Height;
  Format := kCIFormatBGRA8;
  Image := TCIImage.Wrap(TCIImage.OCClass.imageWithBitmapData(Data,
    ABitmap.Width * 4, Size, Format, nil));

  { Pass the image to the face detector. }
  Features := FDetector.featuresInImage(Image, nil);
  if (Features = nil) then
    Exit(nil);

  { Convert the CIFaceFeature objects to TgoFace records. }
  Count := Min(Features.count, FMaxFaces);
  SetLength(Result, Count);

  for I := 0 to Count - 1 do
  begin
    SrcFeature := TCIFaceFeature.Wrap(Features.objectAtIndex(I));

    { Calculate the face bounds. }
    R := SrcFeature.bounds;
    DstFace.Bounds.Left := R.origin.x;
    DstFace.Bounds.Top := ABitmap.Height - R.origin.y - R.size.height;
    DstFace.Bounds.Width := R.size.width;
    DstFace.Bounds.Height := R.size.height;

    { Convert the eye positions. }
    if (SrcFeature.hasLeftEyePosition) then
    begin
      P := SrcFeature.leftEyePosition;
      DstFace.LeftEyePosition := PointF(P.x, ABitmap.Height - P.y);
    end
    else
      DstFace.LeftEyePosition := PointF(0, 0);

    if (SrcFeature.hasRightEyePosition) then
    begin
      P := SrcFeature.rightEyePosition;
      DstFace.RightEyePosition := PointF(P.x, ABitmap.Height - P.y);
    end
    else
      DstFace.RightEyePosition := PointF(0, 0);

    { Calculate the distance between the eyes manually. }
    DstFace.EyesDistance := DstFace.LeftEyePosition.Distance(DstFace.RightEyePosition);

    Result[I] := DstFace;
  end;
end;

The first step is to convert the given FireMonkey bitmap to a CoreImage bitmap that the detector can handle. This involves creating an NSData object with the raw bitmap data, and passing that data to the imageWithBitmapData “class” function of CIImage.

This is another case where we use “class” functions to create Objective-C objects. But in this case, the NSData and CIImage objects are only used for the duration of the DetectFaces method. So we don’t have to use retain (and release) to keep the objects alive.

Next, the image is passed to the featuresInImage method of the detector, which returns an NSArray of CIFaceFeature objects. The last half of the code above just converts these CIFaceFeature objects to TgoFace records. To only caveat is that CIImage objects store bitmaps in bottom-up order, so they are upside-down compared to FireMonkey bitmaps. So in the code above, vertical coordinates are subtracted from the bitmap height to compensate for this.

Android Implementation

The process on the Android side is not that different, just the details. You need to create a JFaceDetector Java object. You call its findFaces method, passing a Java JBitmap object and a (Java) array of JFaceDetector_Face objects.

The type name JFaceDetector_Face means that the FaceDetector class has a nested class type called Face. In Delphi, these get combined into a top-level JFaceDetector_Face class.

A complicating factor is that the Java bitmap must be in RGB565 format. This means 16 bits per pixel, using 5 bits for the red and blue components and 6 bits for the green component (since the human eye is more sensitive to greens). The FireMonkey bitmap is in 32-bit RGBA format, so we need to convert it. This is what the first half of the DetectFaces implementation does:

function TgoFaceDetectorImplementation.DetectFaces(
  const ABitmap: TBitmap): TArray<TgoFace>;
var
  ...
begin
  { Android's FaceDetector class requires Width to be even }
  Width := ABitmap.Width;
  OddWidth := Odd(Width);
  if (OddWidth) then
    Dec(Width);
  Height := ABitmap.Height;

  { Use previously cache FaceDetector class if available and dimensions
    haven't changed. }
  if (FDetector = nil) or (Width <> FWidth) or (Height <> FHeight) then
  begin
    FDetector := nil;
    FWidth := Width;
    FHeight := Height;
    FDetector := TJFaceDetector.JavaClass.init(Width, Height, FMaxFaces);
  end;

  { The FaceDetector class works with the Android Bitmap class.
    FaceDetector requires that the bitmap is in 565 format }
  Bitmap := TJBitmap.JavaClass.createBitmap(Width, Height,
    TJBitmap_Config.JavaClass.RGB_565);
  BitmapId := (Bitmap as ILocalObject).GetObjectID;

  { Use NDK AndroidBitmap APIs for fast access to native Android bitmaps. }
  if (AndroidBitmap_lockPixels(TJNIResolver.GetJNIEnv, BitmapId, @Dst) <> 0) then
    Exit(nil);
  try
    { Copy the FireMonkey bitmap to the native Android bitmap,
      converting to RGB565 format in the process. }
    if (not ABitmap.Map(TMapAccess.Read, SrcBitmap)) then
      Exit(nil);
    try
      Src := SrcBitmap.Data;
      for Y := 0 to Height - 1 do
      begin
        for X := 0 to Width - 1 do
        begin
          C := Src^;
          R := (C shr (16 + 3)) and $1F; // 5 bits
          G := (C shr ( 8 + 2)) and $3F; // 6 bits
          B := (C shr ( 0 + 3)) and $1F; // 5 bits
          Dst^ := (R shl 11) or (G shl 5) or B;
          Inc(Src);
          Inc(Dst);
        end;
        if OddWidth then
          Inc(Src);
      end;
    finally
      ABitmap.Unmap(SrcBitmap);
    end;
  finally
    AndroidBitmap_unlockPixels(TJNIResolver.GetJNIEnv, BitmapId);
  end;
  ...
end;

The top half creates a JFaceDetector object. The only caveat here is that the image width needs to be even.

The remainder of the code converts the FireMonkey bitmap to a Java JBitmap in RGB565 format. Working with Java bitmaps can be time consuming. It is much faster to use the AndroidBitmap APIs from the NDK instead. These APIs provide native access to the underlying pixels in a Java bitmap. The AndroidBitmap_lockPixels API gives us access to these pixels, in a similar way that TBitmap.Map gives access to the pixels in a FireMonkey bitmap. The bulk of the code converts each 32-bit pixel in the FireMonkey bitmap to a 16-bit pixel in the Java bitmap. The following diagram may clarify the conversion:

RgbConversion

Some bit shuffling is used to copy the most significant bits from the red, green and blue components of each 32-bit pixel, and pack them into a 16-bit pixel.

Once we have the Java bitmap, the remainder of the code is more trivial:

  ...
  { Create a Java array of JFaceDetector_Face objects. }
  Faces := TJavaObjectArray<JFaceDetector_Face>.Create(FMaxFaces);
  { Pass this array to the SrcFace detector to find the faces. }
  Count := FDetector.findFaces(Bitmap, Faces);
  if (Count = 0) then
    Exit(nil);

  { Convert the JFaceDetector_Face objects to TgoFace records. }
  SetLength(Result, Count);
  Point := TJPointF.Create;
  for I := 0 to Count - 1 do
  begin
    { Get Java SrcFace from array }
    SrcFace := TJFaceDetector_Face.Wrap(Faces.GetRawItem(I));
    SrcFace.getMidPoint(Point);
    P.X := Point.x;
    P.Y := Point.y;
    Distance := SrcFace.eyesDistance;

    { Calculate the position of the eyes based on the mid point of the SrcFace
      and the distance between the eyes.
      NOTE: We should use SrcFace.pose to rotate the position of the eyes around
      the midpoint. However, on most Android devices, Pose always returns 0,
      so there is not much point in using it. }
    DstFace.LeftEyePosition := PointF(P.X - 0.5 * Distance, P.Y);
    DstFace.RightEyePosition := PointF(P.X + 0.5 * Distance, P.Y);
    DstFace.EyesDistance := Distance;

    { Android does not return the bounds of the SrcFace. Instead, we set it
      ourselves based on the eye positions. We set it in such a way to match
      the way iOS does it. }
    Distance := Distance * 1.35;
    DstFace.Bounds := RectF(P.X - Distance, P.Y - 0.7 * Distance,
                            P.X + Distance, P.Y + 1.3 * Distance);
    Result[I] := DstFace;
  end;
end;

We need to create a Java bridge array of JFaceDetector_Face objects and pass it to the JFaceDetector.findFaces method, together with the bitmap. After that, again it becomes just a matter of converting these JFaceDetector_Face objects to TgoFace records. There are some differences compared to the iOS version though.

First, the Android version does not give us the position of the eyes. Instead, it gives us the midpoint between the eyes and the distance between the eyes. Using this information, we can calculate the eye positions ourselves. A complicating factor is that we should rotate the eye positions based on a “pose” angle. However, on the Android devices I have tested this on, the pose angle is always 0 degrees, so we can simplify the eye position calculations.

Also, the Android version does not give us the the bounding box for the entire face. So we calculate (or actually estimate) this bounding box ourselves using the midpoint between the eyes and the “normal” proportions of the human face. The calculation was created using some trial-and-error to get similar results to the iOS version.

Finishing Up

Face detection doesn’t have to be difficult; especially if the operating system has built-in support for it. It becomes a bit more challenging if you want to perform real-time face detection on a live webcam stream, for example to implement face tracking or fancy filter effects. But the principles will remain the same.

9 thoughts on “Face Detection on Android and iOS

    1. The demo only works on Android and iOS. As the article mentions, on other platforms a no-op version is used that doesn’t do anything.

      Like

      1. I’m sorry, cause I run it directly when it loads using .dproj, without seeing it was default to win32 debug application, so I Activate to android and it runs good on Pictures, but I hope this can be used for live camera face detection. Thank you and good Luck for the update 😀

        Like

  1. hi,
    how do we get the rectangle of the faces? I mean each person image face how do we get copy to a list? what function should we update or change or add?

    Like

    1. I don’t know exactly what you mean. The DetectFaces method already returns a list (array) of TgoFace records. You can add the Bounds value of that record to a TList<TRectF> or whatever.

      If you want a single rectangle surrounding ALL faces, then you can use TRectF.Union, as in:

      Faces := MyFaceDetector.DetectFaces(MyBitmap);
      if (Faces <> nil) then
      begin
        Bounds := Faces[0].Bounds;
        for I := 1 to Length(Faces) - 1 do
          Bounds.Union(Faces[I].Bounds);
      end;
      

      Like

      1. Actually we want to get out or cut or copy the rectangle faces to be on a list of faces. Like when we do face recognition that we only need the rectangle face and not including the body or the entire picture. How do we do that?

        Like

  2. OK. You can use TBitmap or TCanvas for that. You can create a new TBitmap with the dimensions of a face rectangle, and then use TBitmap.CopyFromBitmap to copy a segment from the original bitmap to your new bitmap. Or you can use the Canvas property of a bitmap and use its DrawBitmap method to copy pieces of bitmaps around.

    Please see the (documentation of the) TBitmap and TCanvas classes for more information on this.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s