Artificial emotional intelligence or Emotion AI is also known as emotion recognition or emotion detection technology. In market research, this is commonly referred to as facial coding.
Humans use a lot of non-verbal cues, such as facial expressions, gesture, body language and tone of voice, to communicate their emotions. Our vision is to develop Emotion AI that can detect emotion just the way humans do, from multiple channels. Our long-term goal is to develop “Multimodal Emotion AI”, that combines analysis of both face and speech as complementary signals to provide richer insight into the human expression of emotion. For several years now, Affectiva has been offering industry-leading technology for the analysis of facial expressions of emotions. Most recently, Affectiva has added speech capabilities now available to select beta testers
Emotion detection – Face
Our Emotion AI unobtrusively measures unfiltered and unbiased facial expressions of emotion, using an optical sensor or just a standard webcam. Our technology first identifies a human face in real time or in an image or video. Computer vision algorithms identify key landmarks on the face – for example, the corners of your eyebrows, the tip of your nose, the corners of your mouth. Deep learning algorithms then analyze pixels in those regions to classify facial expressions. Combinations of these facial expressions are then mapped to emotions.
In our products, we measure 7 emotion metrics: anger, contempt, disgust, fear, joy, sadness and surprise. In addition, we provide 20 facial expression metrics. In our SDK and API, we also provide emojis, gender, age, ethnicity and a number of other metrics. Learn more about our metrics.
The face provides a rich canvas of emotion. Humans are innately programmed to express and communicate emotion through facial expressions. Affdex scientifically measures and reports the emotions and facial expressions using sophisticated computer vision and machine learning techniques.
Here are some links to other areas of interest:
- Determining Accuracy
- Mapping Expressions to Emotions
- Obtaining Optimal Results
When you use the Affdex SDK in your applications, you will receive facial expression output in the form of Affdex metrics: seven emotion metrics, 20 facial expression metrics, 13 emojis, and four appearance metrics.
Furthermore, the SDK allows for measuring valence and engagement, as alternative metrics for measuring the emotional experience.
Engagement: A measure of facial muscle activation that illustrates the subject’s expressiveness. The range of values is from 0 to 100.
Valence: A measure of the positive or negative nature of the recorded person’s experience. The range of values is from -100 to 100.
How do we map facial expressions to emotions?
The Emotion predictors use the observed facial expressions as input to calculate the likelihood of an emotion.
Attention – Measure of focus based on the head orientation
Brow Furrow – Both eyebrows moved lower and closer together
Brow Raise – Both eyebrows moved upwards
Cheek Raise – Lifting of the cheeks, often accompanied by “crow’s feet” wrinkles at the eye corners
Chin Raise – The chin boss and the lower lip pushed upwards
Dimpler – The lip corners tightened and pulled inwards
Eye Closure – Both eyelids closed
Eye Widen – The upper lid raised sufficient to expose the entire iris
Inner Brow Raise – The inner corners of eyebrows are raised
Jaw Drop – The jaw pulled downwards
Lid Tighten – The eye aperture narrowed and the eyelids tightened
Lip Corner Depressor – Lip corners dropping downwards (frown)
Lip Press – Pressing the lips together without pushing up the chin boss
Lip Pucker – The lips pushed foward
Lip Stretch – The lips pulled back laterally
Lip Suck – Pull of the lips and the adjacent skin into the mouth
Mouth Open – Lower lip dropped downwards
Nose Wrinkle – Wrinkles appear along the sides and across the root of the nose due to skin pulled upwards
Smile – Lip corners pulling outwards and upwards towards the ears, combined with other indicators from around the face
Smirk – Left or right lip corner pulled upwards and outwards
Upper Lip Raise – The upper lip moved upwards
Laughing – Mouth opened and both eyes closed
Smiley – Smiling, mouth opened and both eyes opened
Relaxed – Smiling and both eyes opened
Wink – Either of the eyes closed
Kissing – The lips puckered and both eyes opened
Stuck Out Tongue – The tongue clearly visible
Stuck Out Tongue and Winking Eye – The tongue clearly visible and either of the eyes closed
Scream – The eyebrows raised and the mouth opened
Flushed – The eyebrows raised and both eyes widened
Smirk – Left or right lip corner pulled upwards and outwards
Disappointed – Frowning, with both lip corners pulled downwards
Rage – The brows furrowed, and the lips tightened and pressed
Neutral – Neutral face without any facial expressions
Using the Metrics
Emotion, Expression and Emoji metrics scores indicate when users show a specific emotion or expression (e.g., a smile) along with the degree of confidence. The metrics can be thought of as detectors: as the emotion or facial expression occurs and intensifies, the score rises from 0 (no expression) to 100 (expression fully present).
In addition, we also expose a composite emotional metric called valence which gives feedback on the overall experience. Valence values from 0 to 100 indicate a neutral to the positive experience, while values from -100 to 0 indicate a negative to neutral experience.
Our SDKs also provide the following metrics about the physical appearance:
The age classifier attempts to estimate the age range. Supported ranges: Under 18, from 18 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, and 65 Plus.
The ethnicity classifier attempts to identify the person’s ethnicity. Supported classes: Caucasian, Black African, South Asian, East Asian and Hispanic.
At the current level of accuracy, the ethnicity and age classifiers are more useful as a quantitative measure of demographics than to correctly identify the age and ethnicity on an individual basis. We are always looking to diversify the data sources included in training those metrics to improve their accuracy levels.
The gender classifier attempts to identify the human perception of gender expression.
In the case of video or live feeds, the Gender, Age and Ethnicity classifiers track a face for a window of time to build confidence in their decision. If the classifier is unable to reach a decision, the classifier value is reported as “Unknown”.
A confidence level of whether the subject in the image is wearing eyeglasses or sunglasses.
Face Tracking and Head Angle Estimation
The SDKs include our latest face tracker which calculates the following metrics:
Facial Landmarks Estimation
The tracking of the cartesian coordinates for the facial landmarks. See the facial landmark mapping here.
Head Orientation Estimation
Estimation of the head position in a 3-D space in Euler angles (pitch, yaw, roll).
The distance between the two outer eye corners.
Emotion detection – Speech
Our speech capability analyzes not what is said, but how it is said, observing changes in speech paralinguistics, tone, loudness, tempo, and voice quality to distinguish speech events, emotions, and gender. The underlying low latency approach is key to enabling the development of real-time emotion-aware apps and devices.
Our first speech based product is a cloud-based API that analyzes a pre-recorded audio segment, such as an MP3 file. The output file provides the analysis on speech events occurring in the audio segment every few hundred milliseconds and not just at the end of the entire utterance. An Emotion SDK that analyzes speech in real-time will be available in the near future.
Data and accuracy
Our algorithms are trained using our emotion data repository, that has now grown to nearly 6 million faces analyzed in 87 countries. We continuously test our algorithms to provide the most reliable and accurate emotion metrics. Now, also using deep learning approaches, we can very quickly tune our algorithms for high performance and accuracy. Our key emotions achieve accuracy in the high 90th percentile. We sampled our test set, comprised of hundreds of thousands of emotion events, from our data repository. This data has been gathered representing real-world, spontaneous facial expressions and vocal utterances, made under challenging conditions such as changes in lighting and background noise, and variances due to ethnicity, age, and gender. You can find more information on how we measure our accuracy here.
How to get it
Our emotion recognition technology is available in several products. From an easy-to-use SDK and API for developers, to robust solutions for market research and advertising.