Emotion AI Overview What is it and how does it work?


Artificial emotional intelligence or Emotion AI is also known as emotion recognition or emotion detection technology. In market research, this is commonly referred to as facial coding.

Humans use a lot of non-verbal cues, such as facial expressions, gesture, body language and tone of voice,  to communicate their emotions.  Our vision is to develop Emotion AI that can detect emotion just the way humans do, from multiple channels. Our long-term goal is to develop “Multimodal Emotion AI”, that combines analysis of both face and speech as complementary signals to provide richer insight into the human expression of emotion. For several years now, Affectiva has been offering industry-leading technology for the analysis of facial expressions of emotions. Most recently, Affectiva has added speech capabilities now available to select beta testers

Emotion detection – Face

Our Emotion AI unobtrusively measures unfiltered and unbiased facial expressions of emotion, using an optical sensor or just a standard webcam. Our technology first identifies a human face in real time or in an image or video. Computer vision algorithms identify key landmarks on the face – for example, the corners of your eyebrows, the tip of your nose, the corners of your mouth. Deep learning algorithms then analyze pixels in those regions to classify facial expressions. Combinations of these facial expressions are then mapped to emotions.

In our products, we measure 7 emotion metrics: anger, contempt, disgust, fear, joy, sadness and surprise. In addition, we provide 20 facial expression metrics.  In our SDK and API, we also provide emojis, gender, age, ethnicity and a number of other metrics. Learn more about our metrics.

The face provides a rich canvas of emotion. Humans are innately programmed to express and communicate emotion through facial expressions. Affdex scientifically measures and reports the emotions and facial expressions using sophisticated computer vision and machine learning techniques.

Here are some links to other areas of interest:

  • Determining Accuracy
  • Mapping Expressions to Emotions
  • Obtaining Optimal Results

When you use the Affdex SDK in your applications, you will receive facial expression output in the form of Affdex metrics: seven emotion metrics, 20 facial expression metrics, 13 emojis, and four appearance metrics.









Furthermore, the SDK allows for measuring valence and engagement, as alternative metrics for measuring the emotional experience.

Engagement: A measure of facial muscle activation that illustrates the subject’s expressiveness. The range of values is from 0 to 100.

Valence: A measure of the positive or negative nature of the recorded person’s experience. The range of values is from -100 to 100.

How do we map facial expressions to emotions?

The Emotion predictors use the observed facial expressions as input to calculate the likelihood of an emotion.

Facial Expressions

Attention – Measure of focus based on the head orientation

Brow Furrow – Both eyebrows moved lower and closer together

Brow Raise – Both eyebrows moved upwards

Cheek Raise – Lifting of the cheeks, often accompanied by “crow’s feet” wrinkles at the eye corners

Chin Raise – The chin boss and the lower lip pushed upwards

Dimpler – The lip corners tightened and pulled inwards

Eye Closure – Both eyelids closed

Eye Widen – The upper lid raised sufficient to expose the entire iris

Inner Brow Raise – The inner corners of eyebrows are raised

Jaw Drop – The jaw pulled downwards

Lid Tighten – The eye aperture narrowed and the eyelids tightened

Lip Corner Depressor – Lip corners dropping downwards (frown)

Lip Press – Pressing the lips together without pushing up the chin boss

Lip Pucker – The lips pushed foward

Lip Stretch – The lips pulled back laterally

Lip Suck – Pull of the lips and the adjacent skin into the mouth

Mouth Open – Lower lip dropped downwards

Nose Wrinkle – Wrinkles appear along the sides and across the root of the nose due to skin pulled upwards

Smile – Lip corners pulling outwards and upwards towards the ears, combined with other indicators from around the face

Smirk – Left or right lip corner pulled upwards and outwards

Upper Lip Raise – The upper lip moved upwards

Emoji Expressions

Laughing – Mouth opened and both eyes closed

Smiley – Smiling, mouth opened and both eyes opened

Relaxed – Smiling and both eyes opened

Wink – Either of the eyes closed

Kissing – The lips puckered and both eyes opened

Stuck Out Tongue – The tongue clearly visible

Stuck Out Tongue and Winking Eye – The tongue clearly visible                                            and either of the eyes closed

Scream – The eyebrows raised and the mouth opened

Flushed – The eyebrows raised and both eyes widened

Smirk – Left or right lip corner pulled upwards and outwards

Disappointed – Frowning, with both lip corners pulled downwards

Rage – The brows furrowed, and the lips tightened and pressed

Neutral – Neutral face without any facial expressions

Using the Metrics

Emotion, Expression and Emoji metrics scores indicate when users show a specific emotion or expression (e.g., a smile) along with the degree of confidence. The metrics can be thought of as detectors: as the emotion or facial expression occurs and intensifies, the score rises from 0 (no expression) to 100 (expression fully present).

In addition, we also expose a composite emotional metric called valence which gives feedback on the overall experience. Valence values from 0 to 100 indicate a neutral to the positive experience, while values from -100 to 0 indicate a negative to neutral experience.


Our SDKs also provide the following metrics about the physical appearance:


The age classifier attempts to estimate the age range. Supported ranges: Under 18, from 18 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, and 65 Plus.


The ethnicity classifier attempts to identify the person’s ethnicity. Supported classes: Caucasian, Black African, South Asian, East Asian and Hispanic.

At the current level of accuracy, the ethnicity and age classifiers are more useful as a quantitative measure of demographics than to correctly identify the age and ethnicity on an individual basis. We are always looking to diversify the data sources included in training those metrics to improve their accuracy levels.


The gender classifier attempts to identify the human perception of gender expression.

In the case of video or live feeds, the Gender, Age and Ethnicity classifiers track a face for a window of time to build confidence in their decision. If the classifier is unable to reach a decision, the classifier value is reported as “Unknown”.


A confidence level of whether the subject in the image is wearing eyeglasses or sunglasses.

Face Tracking and Head Angle Estimation

The SDKs include our latest face tracker which calculates the following metrics:

Facial Landmarks Estimation

The tracking of the cartesian coordinates for the facial landmarks. See the facial landmark mapping here.

Head Orientation Estimation

Estimation of the head position in a 3-D space in Euler angles (pitch, yaw, roll).

Interocular Distance

The distance between the two outer eye corners.

Emotion detection – Speech

Our speech capability analyzes not what is said, but how it is said, observing changes in speech paralinguistics, tone, loudness, tempo, and voice quality to distinguish speech events, emotions, and gender. The underlying low latency approach is key to enabling the development of real-time emotion-aware apps and devices.

Our first speech based product is a cloud-based API that analyzes a pre-recorded audio segment, such as an MP3 file. The output file provides the analysis on speech events occurring in the audio segment every few hundred milliseconds and not just at the end of the entire utterance. An Emotion SDK that analyzes speech in real-time will be available in the near future.

Data and accuracy

Our algorithms are trained using our emotion data repository, that has now grown to nearly 6 million faces analyzed in 87 countries. We continuously test our algorithms to provide the most reliable and accurate emotion metrics. Now, also using deep learning approaches, we can very quickly tune our algorithms for high performance and accuracy. Our key emotions achieve accuracy in the high 90th percentile. We sampled our test set, comprised of hundreds of thousands of emotion events, from our data repository. This data has been gathered representing real-world, spontaneous facial expressions and vocal utterances, made under challenging conditions such as changes in lighting and background noise, and variances due to ethnicity, age, and gender. You can find more information on how we measure our accuracy here.

How to get it

Our emotion recognition technology is available in several products.  From an easy-to-use SDK and API for developers, to robust solutions for market research and advertising.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s