Biases in Face & Emotion Tracking
We seem to subscribe to the popular oversimplification that machines are less biased than humans; however, if you are familiar with the ways in which machines are trained to read and focus on different aspects of data, you will know: It’s just not that simple.
Machines are not free of bias if they are trained by humans.
The following is a presentation of the different types biases that can occur in face tracking and expression labeling. Many of these biases can be reduced; so I have also included suggestions for improved methods. If you are working on face & emotion tracking of any type, it is your responsibility to be aware of these biases.
View the slides below OR the video linked here: YouTube Video


Text: Many face and emotion tracking companies try to use Paul Ekman’s Facial Action Coding System (FACS)
– but many don’t take the time to use it right.

Text: What happens when you don’t use FACS right?
- incorrect expression classification
- inconsistent expression classification
- biased labeling (racial, cultural, age-related, etc.)
- anarchy
Even if you use FACS properly there will always be bias and inconsistency – but by taking careful measures, these issues can be significantly reduced.

Text: incorrect expression classification
- Facial actions are subtle and difficult to differentiate without intensive study.
- Most FACS references (excluding the original FACS Manual) provide incorrect FACS visuals – even sources considered credible.
- Despite these inaccuracies, such sources are often used as references by face tracking engineers and researchers.
- Because tech companies do not invest enough in data-based roles, they likely do not possess the right staff or resources to differentiate important facial actions.


Text: incorrect expression classification
- Basic shapes like “lip tightener” regularly get confused with actions like “lip presser” and/or “lip pucker.”
- Lip tightener is important in: emotion expressions & speech production

Text: incorrect expression classification
- Above is a true representation of lip tightener.
- This is just one of many shapes that fly under the radar each time they are:
– mistaught – misclassified – misused

Text: What happens when you don’t use FACS right?
- incorrect expression classification
- inconsistent expression classification
- biased labeling (racial, cultural, age-related, etc.)
- anarchy
Even if you use FACS properly there will always be bias and inconsistency – but by taking careful measures, these issues can be significantly reduced.

Text: incorrect expression classification
The same issues surrounding incorrect classification also contribute to inconsistent classification.

Text: inconsistent expression classification
If tech companies don’t thoroughly invest in data quality, their data classification rules cannot be standardized.
Due to:
- a lack of investment in hiring and/or training employees for data-based roles
- a lack of quality FACS resources
- an inherent difficulty in differentiating facial actions
→ Labelers classify expressions inconsistently.
→ Trackers develop weird quirks, tying incorrect expressions to each other while confusing others.

Text: inconsistent expression classification
NOTE: This diagram was made to explain problems in shape activation for avatars, but the same basic concepts apply to face and emotion tracking.

Text: What happens when you don’t use FACS right?
- incorrect expression classification
- inconsistent expression classification
- biased labeling (racial, cultural, age-related, etc.)
- anarchy
Even if you use FACS properly there will always be bias and inconsistency – but by taking careful measures, these issues can be significantly reduced.

Text: biased labeling (racial, cultural, age-related, etc.)
When expression labels are ill-defined and poorly understood, small mistakes can create big biases.

Text: biased labeling (racial, cultural, age-related, etc.)
Before going into the biases that can exist in face tracking tech, consider the already-existing biases in feature identification tech: facial recognition.

Text: biased labeling (racial, cultural, age-related, etc.)
If we can’t even get feature detection right, imagine how complicated bias can be in expression tracking. Expression tracking demands an understanding beyond facial features. It demands an understanding of subtle facial movements.

-
Blais, Caroline & Jack, Rachael & Scheepers, Christoph & Fiset, Daniel & Caldara, Roberto. Culture Shapes How We Look at Faces. (2008).PloS one. 3. e3022. 10.1371/journal.pone.0003022.

Text: biased labeling (racial, cultural, age-related, etc.)
- People extract information differently when looking at faces and search for different clues when interpreting emotions.
- Culture deeply influences what we consider valuable information and clues.
Studies have found:
- East Asian participants tend to focus on the middle of the face, around the nose, giving more importance to the eyes and gaze direction.
- Western Caucasian participants tend to look for expressions of emotion in the eyebrows and the mouth areas
- These differences in attention create biases when participants look at faces with conflicting expressions
e.g. When there are sad eyes with a happy mouth:
– Japanese participants give more importance to emotion shown in eyes
– American participants care more about the mouth area.
1. Blais, C., Jack, R. E., Scheepers, C., Fiset, D., and Caldara, R. (2008). Culture shapes how we look at faces. PLoS ONE 3:e3022. doi: 10.1371/journal.pone.0003022
2. Elfenbein, H. A., & Ambady, N. (2003). Universals and cultural differences in recognizing emotions. Current Directions in Psychological Science, 12(5), 159-164.
3. Matsumoto, D., Kasri, F., & Kooken, K. (1999). American-Japanese cultural differences in judgements of expression intensity and subjective experience. Cognition & Emotion, 13(2), 201-218.
4. Matsumoto, D., & Ekman, P. (1989). American-Japanese cultural differences in intensity ratings of facial expressions of emotion. Motivation and Emotion, 13(2), 143-157.
5. Marsh, A. A., Elfenbein, H. A., & Ambady, N. (2003). Nonverbal “accents” cultural differences in facial expressions of emotion. Psychological Science, 14(4), 373-376.
6. Yuki, M., Maddux, W. W., & Masuda, T. (2007). Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and the United States. Journal of Experimental Social Psychology, 43(2), 303-311.

Text: biased labeling (racial, cultural, age-related, etc.)
Now consider what goes into expression labeling . . .

Text: biased labeling (racial, cultural, age-related, etc.)
- If lip corner depressor is not understood correctly, labelers are more likely to incorrectly classify any shape with drooping lip corners as lip corner depressor. These misclassified shapes are often caused by the effects of chin raiser – but they can also include something worse: neutral faces (faces with no expression).
- One likely outcome of this misconception is: over-detection of lip corner depressor in people with drooping lip corners. Often times, the elderly have drooping mouth corners due to long-term effects of gravity.

Text: biased labeling (racial, cultural, age-related, etc.)
- From improper labeling practices, older demographics may be subject to overdetection of lip corner depressor.
- Lip corner depressor is a key component of sadness. Could this error lead to overdetection of sadness in older demographics?
- What about other groups whose facial features are characterized drooping lip corners? Will they be incorrectly interpreted as “sad”?

Text: biased labeling (racial, cultural, age-related, etc.)
- It may not seem like a big deal for a tracker to incorrectly detect lip corner depressor or sadness, but: What happens when face tracking is used for bigger things like mental health assessments or evaluating potential job candidates?

Text: biased labeling (racial, cultural, age-related, etc.)
- AI Is Now Analyzing Candidates’ Facial Expression During Video Job Interviews – Unilever, IBM, Dunkin Donuts and many others are already using this technology
- Emotion-recognition technology doesn’t work, but hiring professionals, others are using it anyway: report
- ‘Emotion detection’ AI is a $20 billion industry. New research says it can’t do what it claims.

Text: biased labeling (racial, cultural, age-related, etc.)
We talked about how improper labeling can incite bias against older populations . . . But what about other biases toward people with certain facial features?

Text: biased labeling (racial, cultural, age-related, etc.)
“. . . facial recognition programs exhibit two distinct types of bias.
First, black faces were consistently scored as angrier than white faces for every smile. Face++ showed this type of bias. Second, black faces were always scored as angrier if there was any ambiguity about their facial expression. Face API displayed this type of disparity. Even if black faces are partially smiling, my analysis showed that the systems assumed more negative emotions as compared to their white counterparts with similar expressions. The average emotional scores were much closer across races, but there were still noticeable differences for black and white faces.”
Understanding the Hidden Bias in Emotion-Reading AIs
-Lauren Rhue

Text: biased labeling (racial, cultural, age-related, etc.)
How might this happen?
Let’s revisit WHY IT’S IMPORTANT TO INVEST MORE IN DATA.

Text: biased labeling (racial, cultural, age-related, etc.)
Consider the FACS shape “upper lip raiser.” Upper lip raiser is a key component of emotions like disgust, anger, and contempt.
Features of upper lip raiser include:
- raised upper lip
- rounded upper nasolabial furrow region
(laugh line – see photo)
NOTE: On top of potential facial action labeling biases, there is also significant controversy over the Basic Emotion Theory (e.g. basic emotion prototypes like contempt, anger, etc.) in general. See “‘It’s All In the Eyes’ and Other Lies.“

Text: biased labeling (racial, cultural, age-related, etc.)
Some people have more curve in their nasolabial furrow.
This is simply a product of their naturally-occurring face structure.

Text: biased labeling (racial, cultural, age-related, etc.)
A labeler not properly versed in the Facial Action Coding System can easily mislabel people with non-expression-based nasolabial furrow curves as having upper lip raiser.
These incorrect labels birth a tracker trained to identify faces with certain structures as expressing upper lip raiser – even when they are neutral or smiling.
Procuring a “diverse data set” is irrelevant if your labels are inaccurate.
Such training explains Lauren Rhue’s findings from Understanding the Hidden Bias in Emotion-Reading AIs.

Text: biased labeling (racial, cultural, age-related, etc.)
The presence of biased emotion labeling will persist even if you move away from a FACS-based approach and attempt to have labelers classify emotions holistically.

Text: biased labeling (racial, cultural, age-related, etc.)
In case it is not clear why holistic emotion labeling is just as likely to fail (if not, moreso) – HERE IS A REFRESHER:
- People extract information differently when looking at faces and search for different clues when interpreting emotions.
- Culture deeply influences what we consider valuable information and clues.
Studies have found:
- East Asian participants tend to focus on the middle of the face, around the nose, giving more importance to the eyes and gaze direction.
- Western Caucasian participants tend to look for expressions of emotion in the eyebrows and the mouth areas
- These differences in attention create biases when participants look at faces with conflicting expressions
e.g. When there are sad eyes with a happy mouth:
– Japanese participants give more importance to emotion shown in eyes
– American participants care more about the mouth area.

Text: What happens when you don’t use FACS right?
- incorrect expression classification
- inconsistent expression classification
- biased labeling (racial, cultural, age-related, etc.)
- anarchy
Even if you use FACS properly there will always be bias and inconsistency – but by taking careful measures, these issues can be significantly reduced.

Text: anarchy
If . . .
- culture influences how we interpret expressions and emotions
- environmental conditions affect which clues we look at to evaluate faces
- knowledge of facial actions affects how we label expressions
. . . What can tech companies do to reduce these human-based errors and prevent them from leaking into algorithms?

Text: What can tech companies do to reduce biases in face tracking?
- Tech companies must recognize that face tracking has advanced beyond the scope of engineering.
- Tech companies must accept tech’s vulnerability to bias and educate employees on these vulnerabilities.
- Tech companies must invest more in data quality.

Text: Recognizing the advancements of face tracking tech, understanding its vulnerabilities, and investing more in data quality means . . .
- Actually allocating headcount toward roles focused on data quality – even if that requires taking space from engineering headcount.
- Spending time, energy, and resources to find data specialists. If this is not possible, it is still the company’s job to spend time, energy, and resources to TRAIN data specialists. Classifying facial expressions is much more complicated than classifying basic objects like traffic signals and should be treated as such.

Text: Reducing biases in face tracking
Recognizing the advancements of face tracking tech, understanding its vulnerabilities, and investing more in data quality means . . .
- Engineering, research, and product should not simply tell data what they need – but LISTEN to what the data teams need. Data teams are the most knowledgeable about behind-the-scenes work required for functioning algorithms. THIS IS WHERE MANY COMPANIES FAIL.
- Requiring employees with data-based roles to regularly interface with engineering, research, and product teams.

Text: Reducing biases in face tracking
Recognizing the advancements of face tracking tech, understanding its vulnerabilities, and investing more in data quality means . . .
- Educating employees on the reality of bias in tech.
- Taking the right precautions to standardize and define labeling.
- Always considering when and where bias can appear.
– Who is labeling?
– What are they labeling?
– What factors might impact their labeling?
* mood * culture * experience

Text: Reducing biases in face tracking
Recognizing the advancements of face tracking tech, understanding its vulnerabilities, and investing more in data quality means . . .
SIDENOTE: STOP LABELING EXPRESSIONS FROM STATIC IMAGES.
- Accurate FACS and emotion labeling is largely contingent on seeing movement.
- If movement were stressed more, static facial features such as drooping lip corners and curved nasolabial furrow would be less likely to activate detection of lip corner depressor and upper lip raiser (respectively).

Text: Reducing biases in face tracking
Closing notes . . .

Text: Reducing biases in face tracking
The more we believe in and rely on technology, the more ramifications these biases will have.

Text: Reducing biases in face tracking
“… facial recognition software interprets emotions differently based on the person’s race. …This finding has implications for individuals, organizations, and society, and it contributes to the growing literature of bias and/or disparate impact in AI.”
Rhue, Lauren, Racial Influence on Automated Perceptions of Emotions (November 9, 2018). Available at SSRN: https://ssrn.com/abstract=3281765 or http://dx.doi.org/10.2139/ssrn.3281765

Text: Reducing biases in face tracking
Take responsibility.
Put in the work to reduce bias.
Invest more in data.

Text:
In response to the prevalence of low-quality and inaccurate FACS references, I have created a free “FACS Cheat Sheet” to serve as a guide for artists, researchers, and engineers. It is available on my FACS resource site, Face the FACS. I am also open for consulting.
Pingback: “it’s all in the eyes” and other lies: a critique on contemporary emotion research – Face the FACS