ARKit & other face tracking mistakes

This post is a follow-up regarding the videos I have been making to evaluate the quality of various tracking kits. Here, I go more in depth on a particular issue I have been seeing in Animoji.

smiles in Animoji

In the video below, watch how Animoji mirrors my smile. There is an unnecessary addition of brow lowering at the inner corners of my eyebrows. This addition persists in various types of smiles both authentic and inauthentic. It also persists in some other users.

Lowered inner brows coupled with a raised top lip (caused by nose wrinkler or upper lip raiser) is often associated with disgust. Additionally, brow lowering is negatively correlated with smiling. These types of semantically significant tracking mistakes unintentionally portray negative sentiment.


how these mistakes come about

  1. Too much focus on engineering. Not enough focus on data quality.
  2.  Too much focus on engineering. Not enough focus on art.

Everywhere I’ve worked, there has been so much concern over hiring engineers with a specific background that headcount is taken away from other essential roles.

Many tech leads live under the assumption that – if they acquire enough data to train their model, problems with quality will simply work themselves out. Wow! Magic. This assumption often operates under an additional (but false) assumption: There is only a negligible percentage of impure data.

I have been deep in the data trenches and have worked almost every non-engineering role in face tracking:

  • data planning – determining what type of expression data to collect and how to collect it
  • data collection – actually working with participants and training them to hit the right expression poses
  • data annotation – determining the best ways to label landmarks
  • data classification – advising engineering on which classes should exist, what their parameters are, and how to handle their inevitable edge cases
  • scaling up – making sure rules for annotation and classification are standardized and easy-to-understand by mass-scale labelers
  • monitoring tracking – comparing ground truth with tracking outcomes
  • identifying areas for improvement – figuring out what problems exist and how they can be improved via planning, collection, annotation, and/or classification
  • avatar development – strategizing which shapes to prioritize based on a mixture of considerations such as –
    • where the tracker fails
    • what the final product use cases are
    • what will be most aesthetically pleasing
    • what is most semantically important

As someone with a technical background in expression science and facial anatomy – who has also served all these functions – I am here to tell you: The amount of impure data in face tracking tech is far from negligible. In short – even if the algorithm is perfect – problems arise from:

  • impure posed data
    • When collecting posed expression data from human participants, the data will always be contaminated. Guaranteed.
    • Most people cannot hit every target expression. It is rare to find pure facial action data. When participants perform impure expressions, it is because they are either displaying the wrong facial action or because they are unable to isolate the target expression without employing additional, non-target facial muscles.
    • To top it all off, data acquisitionists often cannot tell whether or not the participant is even hitting the target expression. This lack of knowledge is not to the fault of the data acquisitionists, but rather, to Company X’s misplaced prioritizes and lack of attention to hiring or keeping the right talent.
  • bad data labeling
    • Because most tech companies are so focused on finding engineering talent, they neglect to prioritize roles related data labeling. Instead, labeling-related efforts are often treated as low-level positions designated for contractors with no particular expertise.
    • Contractors starting out with low experience can eventually become in-tune enough with the data to gain an expertise; however, this rarely happens, because contract labeling roles often have high turnover.
  • uninformed art choices
    • It is important for art to understand the tech, and for tech to understand the art. Harnessing an understanding on both sides is not stressed enough. There is often a large disconnect between art and engineering. For this reason, there should be more roles built to understand both sides, “Create Technologist”-type roles.
    • The reason good trackers look good is not usually because of the trackers themselves, but because of the artistic choices made to combat immature tech behind the scenes.

back to Animoji and why my inner brow corners lower whenever I smile

As mentioned at the start of this post, whenever Animoji attempts to mirror my smile, there is an unnecessary addition of brow lowering at the inner corners of my brow. Again, this addition persists in various types of smiles both authentic and inauthentic.

The chart below is an example of how the interaction between data quality and art can affect various expressions in negative ways and cause issues like brow lowering with smiles. (Yes, there are multiple potential causes, including issues with the algorithm itself; however, this is an outline of a scenario with a specific set of conditions.)


breaking it down

Nose wrinkler and upper lip raiser are two facial actions that look similar.

  •  They are often confused with each other at both the data collection level and the data labeling level.
  • Because many tech companies do not invest in data quality as much as they should, they usually do not house employees who can accurately understand or explain how differentiate nose wrinkler and upper lip raiser.
  • Mistakes in both data collection and data labeling go unnoticed, and talent is unable to catch tracking errors.

A common technique in art (one I advise against) is to use upper lip raiser as a combo shape add-on for smiles.

  • Because the action of lip corner puller lifts the top lip when a smile is intense, many assume this movement is synonymous with upper lip raiser. It is not.
  • Many artists use the upper lip raiser shape to combine with lip corner puller to create a strong smile. More details here.
  • Even if I didn’t have strong aesthetic and accuracy-based issues with this technique, there would still be big problem:
    • If upper lip raiser is tied to nose wrinkler, when a strong smile is initiated, it will activate upper lip raiser, which will then activate nose wrinkler. Therefore, when someone smiles, their eyebrows will lower. Bad.

Not enough attention is paid to data quality.
Not enough credit is given to art.

fixing the issue

Going back to my post on Big Tech’s Homogenous Hiring Habits, these problems can be mitigated by incorporating hiring strategies with less tunnel vision. Machine learning has advanced to a point of requiring cross-disciplinary expertise. Hire the right people, and don’t be shortsighted regarding talent needs.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.