Tracking What You Cannot See
Throughout my years working in face tracking, I observed a superstition among engineers and researchers that you cannot track facial landmarks (features like the eyebrows, eyes, and mouth) when they fall outside the camera view. This belief is not entirely true. You don’t need to see an eyebrow to know whether it’s raising or furrowing, and you don’t need to see the nose to know when it’s wrinkling.
Our faces bulge, stretch, and wrinkle uniquely with each facial action. I have used these changes to train labelers to recognize and accurately classify discrete expressions from skin movement alone. With well-directed documentation, a comprehensive set of examples, and a high-quality camera, you can extrapolate bounds of information with a limited FOV.
The minimal FOV required for eye-tracking is often enough to track a handful of actions. View A (in the image set below) is most reflective of what a gaze-based tracker view may look like. Though the main goal of eye tracking cameras is to cover just enough of the eye to observe changes in gaze, its potential is much greater. Even with this concentrated view, you can still detect upper lid raiser (AU5), cheek raiser (AU6), and lid tightener (AU7) with a relatively high degree of certainty. These actions are useful for their applications in measuring attention, reactions, and engagement; they are also crucial signals in communication.
Many people get blocked by action unit names like cheek raiser and assume, “We can’t track cheek raiser because our FOV doesn’t cover the cheek area.” But cheek raiser is more than its name reveals; it’s an action caused by the contraction of orbicularis oculi, a muscle surrounding the eye area. While movements of the orbicularis oculi do impact the cheeks, many changes actually take place in the eye socket area. As long as you have a marginal view of the eye corners or a sliver of skin under the lower eyelid, you can determine whether or not cheek raiser is occurring. Similar concepts apply to the other actions I have listed in the images below.Â
Capabilities With Different FOVs
This chart shows which action units (AUs) are possible to detect with various fields of view. Keep in mind this is an abridged breakdown of what may or may not be possible with different FOVs. (If you wish to learn about predictions for lower face and combination shapes, I am available for consultation.) Conditions will change based on additional factors such as camera angle and how the headset rests on the face. (Is the headset heavy? How does its weight and pressure affect various areas of the face?)Â
If you are working on in-headset face-tracking, don’t let assumptions limit your potential. The face is complicated and full of clues. All you need to do is find the right clues, and you can accomplish a lot from a little. The remaining content is for Premium Members only. If you ARE a member, sign in and go back to this page :)!




This is brilliant & informative. Now, I have a better idea on how to detect emotions of people during Halloween! Hmm…I still need a database of expressions & emotions mapped to the AU attributes (which may be broken down further by shape, width, depth, type of skin). This field is fascinating!
Hi
was searching on online job , I get FACS course before and I was working in affectiva Co.
Thanks