ARKit & Other Face Tracking Mistakes

Why do so many face trackers and avatars suffer from false positive brow lowering? Using Apple’s ARKit as a case study, this breakdown explores the root of the issue and reveals a surprisingly simple fix at the end. 

Avatars & Smiles: Memoji Edition

False-positive brow lowering is a widespread issue in face tracking and avatar animation. While subtle, these errors can distort an avatar’s expressions, introducing unintended cues for sadness, anger, or concern. In the video below, I stress-test Apple’s Memoji to demonstrate this problem in action.

 

In the clip above, observe how my Memoji “mirrors” my smile. Notice anything strange or unpleasant? My inner brow corners are erroneously lowered. The addition of misattributed brow lowering persists in various types of smiles. It persists in other users as well.

Lowered inner brows coupled with a raised top lip (caused by nose wrinkler or upper lip raiser) is often associated with unpleasant emotions. Lowered brow tips are also negatively correlated with smiling. These types of semantically significant tracking mistakes can unintentionally imply negative sentiment.

How These Mistakes Come About

False-positive brow lowering often stems from issues in data quality, labeling errors, and artistic choices. Here’s why these problems persist:

  1. There is not enough focus on data quality.
  2. There is not enough focus on art.

Everywhere I’ve worked, there has been so much concern over hiring engineers with a specific background that headcount is taken away from other essential roles.

Many tech leads assume that if they acquire enough data to train their model, problems with quality will simply work themselves out. Wow! Magic. This assumption often operates under an additional (but false) belief that there is only a negligible percentage of impure data.

I’ve been deep in the data trenches and have worked almost every non-engineering role in face tracking:

  • data planning – determining what type of expression data to collect and how to collect it
  • data collection – actually working with participants and training them to hit the right expression poses
  • data annotation – determining the best ways to label landmarks
  • data classification – advising engineering on which classes should exist, what their parameters are, and how to handle their inevitable edge cases
  • scaling up – making sure rules for annotation and classification are standardized and easy-to-understand by mass-scale labelers
  • monitoring tracking – comparing ground truth with tracking outcomes
  • identifying areas for improvement – figuring out what problems exist and how they can be improved via planning, collection, annotation, and/or classification
  • avatar development – strategizing which shapes to prioritize based on a mixture of considerations such as –
    • where the tracker fails
    • what the final product use cases are
    • what will be most aesthetically pleasing
    • what is most semantically important

With my background in expression science, facial anatomy, and extensive work with facial data, I can confidently say impure data is a major issue in face tracking, not a minor one. In short, even if the algorithm is perfect, problems arise from:

Impure Posed Data

  • When collecting posed expression data from human participants, the data will always be contaminated. Guaranteed.
  • Most people cannot hit every target expression. It is rare to find pure facial action data. When participants perform impure expressions, it is because they are either displaying the wrong facial action or because they are unable to isolate the target expression without employing additional, non-target facial muscles.
  • To top it all off, data acquisitionists often cannot tell whether the participant is even hitting the target expression. This lack of knowledge is not to the fault of the data acquisitionists, but rather, to Company X’s misplaced priorities and lack of attention to hiring or keeping the right talent.

Bad Data Labeling

  • Due to a hyper-focus on engineering talent, companies typically neglect prioritizing data labeling roles. Instead, annotation efforts are often treated as low-level positions designated for contractors with no particular expertise.
  • Contractors starting out with little experience can eventually become in-tune enough with the data to gain an expertise; however, this rarely happens, because contract labeling roles generally have high turnover.

Uninformed Art Choices

  • It is important for art to understand the tech, and for tech to understand the art. There is a staggering disconnect between art and engineering within tech companies. In many cases, good trackers look good not because of the trackers themselves, but because of the artistic choices made to combat immature tech behind the scenes. More resources should be invested toward bridging the gap between art and engineering.

Back to Memoji & False Positive Brow Lowering

As mentioned at the start of this post, whenever Memoji attempts to mirror my smile, there is an unnecessary addition of brow lowering at the inner corners of my brow. Again, this addition persists in various types of smiles.

The chart below is an example of how the interaction between data quality and art can affect various expressions in negative ways and cause issues like brow lowering with smiles.

NOTE: There of course can be multiple potential causes, including issues with the algorithm itself; however, this outline is made to highlight a specific scenario with a specific set of conditions.

diagram of issues of face tracking and avatars using FACS shapes

Breaking It Down

Nose wrinkler and upper lip raiser are two facial actions that look similar.

  • They are often confused with each other at both the data collection level and the data labeling level.
  • Because tech companies do not properly invest in data quality, they usually do not foster employees who can accurately identify or explain how to differentiate nose wrinkler and upper lip raiser.
  • Mistakes in both data collection and data labeling go unnoticed, and talent is unable to catch tracking errors.

A common technique in art is to use upper lip raiser as a combo shape add-on for smiles.

  • Because the action of lip corner puller lifts the top lip when a smile is intense, many assume this movement is synonymous with upper lip raiser. It is not quite that simple.
  • Many artists use the upper lip raiser shape to combine with lip corner puller to create a strong smile. More details here.
  • Aesthetic and accuracy issues aside, a glaring problem remains: If upper lip raiser is tied to nose wrinkler, when a strong smile is initiated, it will activate upper lip raiser, which will then activate nose wrinkler. Therefore, when someone smiles, their eyebrows will lower. Bad.

Not enough attention is paid to data quality.
Not enough credit is given to art.

Fixing the Issue

The simplest fix? Step away from FACS and lean on facial anatomy. The easiest solution to reduce false positive brow lowering via nose wrinkler and upper lip raiser is to honor the properties of the muscles behind these actions.

From an anatomical standpoint, neither levator palpebrae superioris (upper lip raiser muscle) nor levator palpebrae superioris alaeque nasi (nose wrinkler muscle) lowers the eyebrows. It is only within the confines of FACS facial coding rules that brow lowering is tied to nose wrinkling. 

You can save yourself a headache if you simply:

  1. redesign your labeling rules to classify brow lowerer as a separate action from nose wrinkler
  2. redesign your blendshapes to host a nose wrinkler without inherent brow lowering

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Designed for studios and teams

Let's talk.

facetheFACS@melindaozel.com