Tracking What You Cannot See
Throughout my years working in face tracking, I observed a superstition among engineers and researchers that you cannot track facial landmarks (features like the eyebrows, eyes, and mouth) when they fall outside the camera view. This belief is not entirely true. You don’t need to see an eyebrow to know whether it’s raising or furrowing, and you don’t need to see the nose to know when it’s wrinkling.
Our faces bulge, stretch, and wrinkle uniquely with each facial action. I have used these changes to train labelers to recognize and accurately classify discrete expressions from skin movement alone. With well-directed documentation, a comprehensive set of examples, and a high-quality camera, you can extrapolate bounds of information with a limited FOV.
The minimal FOV required for eye-tracking is often enough to track a handful of actions. View A (in the image set below) is most reflective of what a gaze-based tracker view may look like. Though the main goal of eye tracking cameras is to cover just enough of the eye to observe changes in gaze, its potential is much greater. Even with this concentrated view, you can still detect upper lid raiser (AU5), cheek raiser (AU6), and lid tightener (AU7) with a relatively high degree of certainty. These actions are useful for their applications in measuring attention, reactions, and engagement; they are also crucial signals in communication.
Many people get blocked by action unit names like cheek raiser and assume, “We can’t track cheek raiser because our FOV doesn’t cover the cheek area.” But cheek raiser is more than its name reveals; it’s an action caused by the contraction of orbicularis oculi, a muscle surrounding the eye area. While movements of the orbicularis oculi do impact the cheeks, many changes actually take place in the eye socket area. As long as you have a marginal view of the eye corners or a sliver of skin under the lower eyelid, you can determine whether or not cheek raiser is occurring. Similar concepts apply to the other actions I have listed in the images below.
NOTE TO READERS: I left Facebook in late 2019, because the entire AR/VR organization would only offer me short-term employment.
- My salary was 60% the value of a UX Researcher.
- I was not given stock.
- I was ineligible for bonuses.
- I was given a forced fix salary.
If you find the content in this post helpful, please read, “Big Tech’s Homogenous Hiring Habits” and educate yourself on the importance of valuing cross-disciplinary knowledge in emerging technology.
Capabilities With Different FOVs
This chart shows which action units (AUs) are possible to detect with various fields of view. Keep in mind this is an abridged breakdown of what may or may not be possible with different FOVs. (If you wish to learn about predictions for lower face and combination shapes, I am available for consultation.) Conditions will change based on additional factors such as camera angle and how the headset rests on the face. (Is the headset heavy? How does its weight and pressure affect various areas of the face?)
If you are working on in-headset face-tracking, don’t let assumptions limit your potential. The face is complicated and full of clues. All you need to do is find the right clues, and you can accomplish a lot from a little.
AU1 = inner brow raiser
AU2 = outer brow raiser
AU4 = brow lowerer
AU5 = upper lid raiser
AU6 = cheek raiser
AU7 = lid tightener
AU9 = nose wrinkler
AU10 = upper lip raiser
AU12 = lip corner puller
green box = detectable at most levels of intensity, robust to facial structure
yellow box = detectable at moderate to high intensity levels, less robust to facial structure
orange box = contingent on intensity level, fallible to certain facial structures