top of page

Emotion Recognition from Images

Ronak Kosti*

An overview on importance of emotion recognition and how to recognize emotion through images.

A brief background on Emotion Research

Why do people feel the way they do? Why is someone happy and why is anyone sad? We call ourselves rational beings, then why is it that we are affected by these feelings and quite often rely on them? These basic inquiries have been pondered over since ancient times by philosophers like Plato and Aristotle [1]. Later on, philosophers like Descartes [2] offered to break down the spectrum of human emotions into a few elemental components out of which all other emotions are synthesized. Although the inquiry was speculative and philosophical in nature, their attempts made way for future researchers to find the underlying structure of human emotions. Darwin, a naturalist and a meticulous analyst (one of the most prominent figure in human history), was also puzzled by the richness of human feelings and their social constructs. He postulated that there might be universal basis for all emotions, that there are fixed set of emotions decipherable from facial expressions. He also investigated display of feelings in animals [3].

Emotion_Figure_1.png

Figure 1: Random images depicting people in different activities, moods, locations, etc. (higher resolution available at: https://github.com/rkosti/emotic/blob/master/various_images.png)

During 1970’s psychologists Ekman and Friesen, inspired by Darwin’s work, came up with a definitive coding of facial expressions into action units called Facial Action Coding System (FACS) [4] which pushed forward emotion research. Their work on facial expression recognition through this coding system suggested that movement of specific muscles on our face correspond to distinct emotions. This helped in augmenting our understanding of human emotions because their work suggested that emotions can be codified (or quantified), and if there is a possibility to represent emotions in a systematic manner then it could be a breakthrough for studying emotions more rigorously . This led to a huge influence on the future research in human behavior analysis too. Body posture and the surrounding scene context also influence the perception of emotions [5,6]. This showed that emotions are not localized on human face, but they are dependent on external sources.

​

Even cognitive neuroscientists started delving into the processes responsible for emotion elicitation in the brain. Amygdala, Hippocampus and Hypothalamus [7] are part of the limbic system of the brain, also, normally called the emotional centre of the brain. Amygdala helps in processing the emotions like Fear, Anxiety and Pleasure. Hippocampus provides mechanisms for storing past experiences in the form of memories and Hypothalamus controls the motor functions, including emotional responses. Together, they form the limbic functional system that helps us maintain our emotional health.

​

All this research provides good evidence on the importance of emotions in our daily lives and, quite often, for our survival. It is no wonder why recently computer scientists have been interested in emotional analysis, specifically computing human emotions through expressions in face, voice and body [8]. An application is in the driver-assist technology. It is well known that human performance decreases during prolonged sleep deprivation (Posada-Quintero et al. [2018]). Due to this, the probability of accidents due to driver’s drowsiness is increased. Automatic emotion recognition can help detect the emotions that are responsible for this kind of tiredness early on and help avoid unwanted road-mishaps. Emotion recognition is very important from health applications as well. Emotion recognition systems can help detect early onset of psychological disorders (like autism, bi-polar and anxiety) and help diagnose their nature. Content monitoring on social networks like Facebook, Twitter, Instagram has become of utmost importance to filter unwanted content (violence, graphical images, etc). Analyzing the affective (evoked emotions) content of the image or the tweet is essential in recognizing the potential harm it can cause. Automatic emotion recognition systems are, therefore, of utmost importance.

What are Emotions?
We as social beings, have many modes of communicating our feelings, using facial expressions, voice intonation, hand-gestures, body posture and language (idioms, sarcasm, etc). While trying to communicate, we may combine one or more of these to effectively impart what we feel. A holistic view of these feelings can be loosely termed as Emotions. Their understanding and perception, however, changes from person-to-person. When we see a happy person, there could be various combinations of different modes (mentioned at the beginning of this paragraph) at work that we interpret to recognize that the person is happy. In our work Emotion Recognition in Context [9], we explore how the scene (or the background) plays an important role in emotion perception (or recognition). When we try to convey our feelings to others, each person understands those feelings (perceives) in a very different manner. The perception differs depending on how effectively the feelings are communicated and also depends on the sensitivity of the perceiver (person who is trying to observe those feelings). For example, when asked to recognize the emotions felt by the

girl in Figure 2, people interpreted differently. Person A interpreted that the girl is feeling happy and excited about something (probably something she saw in her tablet); whereas Person B perceives that the girl is also surprised in addition to being happy and excited. Due to the inherent subjective-ness of this process, it is not only ambiguous to define emotions, but also very difficult to quantify

Emotion_Figure_2.png

Figure 2. Person A infers Happiness, Excitement. Person B infers Happiness, Excitement and Surprise.

them. In our work [9], we tried to overcome this difficulty by trying to represent emotions comprehensively using 2 different methods.

​

Universal Basic Emotions – or Not?

One thing is clear: the universality of emotions is a popular and widely used modus operandi amongst researchers, academics and general public. There are lot of research articles published based on multiple experiments trying to prove the universality of basic emotions and their cross-cultural application. However, there are many (if not more) published articles that refute this claim, those are also based on extensive experimentation and analyses. So, currently the science of emotions is unclear to the universality of human emotions. The first recorded study on emotions was done by Charles Darwin, when in his study called The expression of emotions in man and animals (1872, republished in 1998), he purported the idea of universality of human emotions.

​

The opinions of 2 main experts from psychology and neuroscience, Paul Ekman and Lisa Feldman Barrett, are rooted at the opposite ends regarding the universality of the basic human emotions. While P. Ekman contends that human emotions are basic and universal [10,11] (and that there are 6 of them viz. happiness, sadness, fear, surprise, disgust and anger), L.F. Barrett asserts that the emotions are constructed by the brain depending on a lot of variables [12,13] (including culture, perception and source of stimulation) and that there is no universality to it. Their arguments and analysis are sound and each has its advantages and shortcomings depending on the situation. Both have their arguments based on experiments and models, however, neither is completely true nor completely false. The problem, in my opinion, is that we were trying very hard to root the emotions into being basic, however, emotions have been shown, time and again, to be very subjective in nature. How can then such a concept be defined such that it encompasses all the different cultures, geographies, viewpoints, and still be unequivocal? There is a huge research potential in this direction.

​

Emotion Recognition – a Necessity

Application point of view, there are many important scenarios where automatic emotion recognition is crucial. Human Computer Interaction (HCI) involves a verbal or non-verbal interchange between machine and human. It is easy for us to understand what the machine is telling us (because we designed it), however, it is difficult for the machines to understand our response. Automatic emotion recognition should help the machine's limited capability to understand emotional response from humans. Another similar application is in the driver-assist technology. It is well known that human performance decreases during prolonged sleep deprivation [14]. Due to this, the probability of accidents due to driver's drowsiness is increased. Automatic emotion recognition can help detect the emotions that are responsible for this kind of tiredness early on and help avoid road-accidents. Emotion recognition is very important from health applications as well. Emotion recognition systems can help detect early onset of psychological disorders (like autism, bi-polar and anxiety) and help diagnose their nature. Content monitoring on social networks like Facebook, Twitter, Instagram has become of utmost importance to filter unwanted content (violence, graphical images, etc). Analyzing the affective (evoked emotions) content of the image or the tweet is essential in recognizing the potential harm it can cause. Human behavior analysis is an important research area where human emotions plays a crucial role.

​

Although, computationally, it is very difficult to formalize what happiness is or what do we mean when we say that a person is angry, the current mobile market is rife with many applications that claim to be recognizing our emotions. These applications, many of which claim that they employ AI (Artificial Intelligence) or ML (Machine Learning), predominantly use the facial expressions to detect the apparent emotions from the face. These applications run on algorithms that have been pre-trained with images of faces that depict different emotions. Pre-training helps to build a robust and effective systems, although the bottleneck is the data. Due to the current technological revolution, generating data (specifically labeled data) has become very easy and cheap.

​

Emotion Recognition from Images

So goes the saying: “A picture is worth a thousand words” - irrespective of the correctness of claim made in the idiom, human vision is one of the five sensory mechanisms (vision, smell, taste, hearing, touch) through which we can visualize, understand and navigate through the physical world around us. Vision helps us apprehend the incoming obstacle to move around efficiently, helps us recognize the objects surrounding us (including people) and view the physical world (its complexity and grandeur) in the visible spectrum.

​

A photographic camera is the main source of generating images today. With revolutionary advances in chip manufacturing, design and electronic systems, cameras have become ubiquitous and available at very cheap rates across multitude of devices including mobile phones, tablets, laptops, surveillance systems, satellites, automobiles, etc. This has led to generation of billions of photos which capture almost every aspect of our daily lives. There are lot of images with people showing various emotions, performing diverse activities in different situations, backgrounds and social gatherings. A few of those images are shown in Figure 1. Just by taking a short look at each those A photographic camera is the main source of generating images today. With revolutionary advances in chip manufacturing, design and electronic systems, cameras have become ubiquitous and available at very cheap rates across multitude of devices including mobile phones, tablets, laptops, surveillance systems, satellites, automobiles, etc. This has led to generation of billions of photos which capture almost every aspect of our daily lives. There are lot of images with people showing various emotions, performing diverse activities in different situations, backgrounds and social gatherings. A few of those images are shown in Figure 1. Just by taking a short look at each those images, we process multitude of information:

  • What activity is being done - skiing, playing, painting, etc.

  • What is the location - mountain, living-room, office, etc.

  • Who is present and what is that person doing - a woman holding a wii-like console, couple of hikers in the snow, depressed woman, artist painting, etc.

  • What is the mood - engaged in the painting, happy with the dog, depressed in the office, etc.

​

This is just from the top of my head, when each person looks at the images, different objects, relationships between them, different aspects, reactions, emotions, etc are revealed. A single image is a rich source of information; and current machine learning (and/or computer vision) algorithms have made it possible to process that information and quantify it. It is possible to recognize (or perceive) a person’s apparent emotion from an image.

​

For example, look at Figure 3. The person (in this case a boy on a skateboard) is marked in a red bounding-box. We observe that the person is on a skateboard and looking ahead with focus. This shows that he is engaged in the activity. Also, by his body posture one can say that he is performing a ‘trick’ on his skateboard which seems risky, so we can say that he is also excited at that instant (due to the adrenaline rush that he might be getting while doing this ‘trick’). One more thing to note is that, due to the curvature of the road, it is necessary for him to keep anticipating the road ahead, else he might lose balance. Overall, he is having 3 distinct feelings viz. Anticipation, Excitement and Engagement. Another way of understanding the emotional states is to evaluate the intensity of 3

Emotion_Figure_3.png

different aspects. Valence (V) represents how positive (10) or negative (1) a person feels in the situation, Arousal (A) represents how excited (10) or calm (1) a person is, and Dominance (D) represents how confident (10) or uncontrollable (1) a person feels in that given situation. Using these measurement tools, we can predict the Valence, Arousal and Dominance values for the person in Figure 3. The boy is neither sad nor happy about the situation so (V)alence ~ 6; he looks very excited given the kind of activity he is doing, so (A)rousal ~ 9; and he looks very confident about what he is doing, so (D)ominance ~ 10. For more details on the emotion representation, I suggest you to refer [9].

Figure 3: Example of predicting emotion of a person from an Image. (higher resolution available at: https://github.com/rkosti/emotic/blob/master/sample_annotation.png)

Summary

Analyzing human behavior is important for multiple reasons. When we look at a person, it is very easy for us to put ourselves in his situation, and even to feel, to some degree, things that this person appears to be feeling. We use frequently this exceptional ability of estimating how others feel in our everyday lives. Such empathizing capacity serves us to be more helpful, sensitive, sympathetic, affectionate and cordial in our social interactions. More generally, this capacity helps us understand other people, to understand the motivations and goals behind their actions and to predict how they will react to different events. Automatic recognition of emotions has a lot of applications in environments where machines need to interact or monitor humans. For instance, automatic tutors in an on-line learning platform would provide better feedback to a student according to her level of motivation or frustration. This makes emotion recognition one of the more important areas of human behavior analysis.

​

* Ronak Kosti is currently a research scholar in Universitat Oberta de Catalunya, Barcelona, Spain.

​

Bibliography

  1. R. de Sousa. Emotion. In E. N. Zalta, editor, The Stanford Encyclopaedia of Philosophy. Metaphysics Research Lab, Stanford University, winter 2017 edition, 2017.
    [https://plato.stanford.edu/entries/emotion/]

  2. A. R. Damasio. Descartes’ error: Emotion, reason and the human brain. Bulletin of the American Meteorological Society, 83(5):742, 2002.
    [https://monoskop.org/File:Damasio_Antonio_R_Descartes_Error_Emotion_Reason_and_the_Human_Brain.pdf]

  3. C. Darwin. The expression of the emotions in man and animals. Oxford University Press, USA, 1872/1998.
    [http://darwin-online.org.uk/content/frameset?pageseq=1&itemID=F1142&viewtype=text]

  4. P. Ekman and W. V. Friesen. The repertoire of nonverbal behaviour: Categories, origins, usage, and coding. semiotica, 1(1):49–98, 1969.
    [https://www.paulekman.com/wp-content/uploads/2013/07/The-Repertoire-Of-Nonverbal-Behavior-Categories-Origins-.pdf]

  5. R. R. Hassin, H. Aviezer, and S. Bentin. Inherently ambiguous: Facial expressions of emotions, in context. Emotion Review, 5(1):60–65, 2013. ISSN 17540739. doi: 10.1177/ 1754073912451331.
    [http://labconscious.huji.ac.il/wp-content/uploads/2017/09/1-s2.0-S2352250X1730043X-main.pdf]

  6. H. Aviezer, R. Hassin, S. Bentin, and Y. Trope. Putting facial expressions back in context. First impressions, pages 255–286, 2008a.
    [http://cel.huji.ac.il/publications/pdfs/Aviezer_et_al_2008_Chapter_in_First_Impressions.pdf]

  7. C. Stephani. Limbic system. In M. J. Aminoff and R. B. Daroff, editors, Encyclopaedia of the Neurological Sciences (Second Edition), pages 897 – 900. Academic Press, Oxford, second edition, 2014. ISBN 978-0-12-385158-1. doi: https: //doi.org/10.1016/B978-0-12-385157-4.01157-X.
    [http://www.sciencedirect.com/science/article/pii/B978012385157401157X]

  8. T. Ba ̈nziger, D. Grandjean, and K. R. Scherer. Emotion recognition from expressions in face, voice, and body: the multimodal emotion recognition test (mert). Emotion, 9(5): 691, 2009a.
    [https://www.researchgate.net/publication/26869810_Emotion_Recognition_From_Expressions_in_Face_Voice_and_Body_The_Multimodal_Emotion_Recognition_Test_MERT]

  9. Kosti R, Alvarez JM, Recasens A, Lapedriza A. Emotion recognition in context. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 Jul 1 (Vol. 1)
    [http://openaccess.thecvf.com/content_cvpr_2017/html/Kosti_Emotion_Recognition_in_CVPR_2017_paper.html]

  10. Ekman P. An argument for basic emotions. Cognition & emotion. 1992 May 1;6(3-4):169-200.
    [https://www.paulekman.com/wp-content/uploads/2013/07/An-Argument-For-Basic-Emotions.pdf]

  11. Ekman, P., Cordaro, D., 2011. What is meant by calling emotions basic. Emot. Rev. 3, 364–370.
    [http://journals.sagepub.com/doi/pdf/10.1177/1754073911410740]

  12. Barrett, L., 2006a. Are emotions natural kinds? Perspect. Psychol. Sci. 1, 28–58.
    [http://journals.sagepub.com/doi/pdf/10.1111/j.1745-6916.2006.00003.x]

  13. Barrett LF. The theory of constructed emotion: an active inference account of interoception and categorization. Social cognitive and affective neuroscience. 2017 Jan 1;12(1):1-23.
    [https://www.affective-science.org/pubs/2017/barrett-tce-scan-2017.pdf]

  14. H. F. Posada-Quintero, J. B. Bolkhovsky, M. Qin, and K. H. Chon. Human performance deterioration due to prolonged wakefulness can be accurately detected using time-varying spectral analysis of electrodermal activity. Human factors, page 0018720818781196, 2018.
    [http://journals.sagepub.com/doi/pdf/10.1177/0018720818781196]

 

©2018 by IMFS-A. Proudly created with Wix.com

bottom of page