A CNN-LSTM BASED DEEP NEURAL NETWORKS FOR FACIAL EMOTION DETECTION IN VIDEOS

Authors

  • Arnold Sachith A Hans
  • Smitha Rao

DOI:

https://doi.org/10.29284/ijasis.7.1.2021.11-20

Keywords:

Facial Emotion Detection, Masked Face, CNN, LSTM, Open Face

Abstract

Human beings while communicating use emotions as a medium to understand the other person. Face being the primary source of contact while communicating and being the most communicative component of the body for exhibiting emotions, facial emotion detection in videos has been a challenging and an interesting problem to be addressed. The Facial expressions fall under the category of non-verbal type of communication and understanding Emotional state of a person through Facial Expressions has many use cases such as in the field of marketing research – understanding the customers responses for various products, Virtual classroom – understanding the comprehension level of the students, Job Interview – in understanding the changes in emotional state of the Interviewee, etc. This research paper proposes a CNN- LSTM based Neural Network which has been trained on CREMA-D dataset and tested on RAVDEES dataset for six basic emotions i.e. Angry, Happy, Sad, Fear, Disgust, and Neutral. The Faces in the videos were masked using Open Face software which gets the attention on the Face ignoring the background, which were further fed to the Convolutional Neural Network. The research focuses on using LSTM networks which have the capability of using the series of data which will aid in the final prediction of emotions in a video. We achieved an accuracy of 78.52% on CREMA-D dataset and further also tested the model on RAVDEES dataset and achieved an accuracy of 63.35%. This research work will help in making machines understand emotions, can help systems make better decisions and respond accordingly to the user.

References

A. A. A. Zamil, S. Hasan, S. M. Jannatul Baki, J. M. Adam and I. Zaman, “Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames,” International Conference on Robotics, Electrical and Signal Processing Techniques, 2019, pp. 281-285.

M. G. de Pinto, M. Polignano, P. Lops and G. Semeraro, “Emotions Understanding Model from Spoken Language using Deep Neural Networks and Mel-Frequency Cepstral Coefficients,” IEEE Conference on Evolving and Adaptive Intelligent Systems, 2020, pp.1-5.

E. Ghaleb, M. Popa and S. Asteriadis, “Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition," 8th International Conference on Affective Computing and Intelligent Interaction, 2019, pp. 552 -558.

Z. Rzayeva and E. Alasgarov, “Facial Emotion Recognition using Convolutional Neural Networks,” IEEE 13th International Conference on Application of Information and Communication Technologies 2019, pp. 1 -5.

S. R. Livingstone, F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS):” A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE Vol. 13 No. 5, 2018, e0196391.

S. Bursic, G. Boccignone, A. Ferrara, A. D’Amelio, R. Lanzarotti, “Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep Learning.” Appl. Sci. Vol. 10, No. 11, 2020, 4002.

Open Face 2.0: “Facial Behavior Analysis Toolkit Tadas Baltrušaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency,” IEEE International Conference on Automatic Face and Gesture Recognition, 2018.

M. F. H. Siddiqui, A. Y. Javaid, “A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images.” Multi modal Technol. Interact, Vol. 4, No. 3, 2020, pp.46.

S. W. Byun, S. P. Lee, “Human emotion recognition based on the weighted integration method using image sequences and acoustic features.” Multimed Tools Applications, 2020, pp. 1-15.

X. Wang, X. Chen, C. Cao "Human emotion recognition by optimally fusing facial expression and speech feature". Signal Process Image Communication, Vol. 84, 2020, pp. 115831.

Y. Ma, Y. Hao, M. Chen, J. Chen, P. Lu, A. Košir, “Audio-visual emotion fusion (AVEF): A deep efficient weighted approach.” Information Fusion Vol. 46, 2019, pp. 184–192.

M. S. Hossain, G. Muhammad, “Emotion recognition using deep learning approach from audio–visual emotional big data.” Information Fusion Vol. 49, 2019, pp. 69–78.

H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, R. Verma “CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset.” IEEE Trans Affect Computer. Vol. 5, No. 4, 2014, pp. 377-390.

G. Deepak, L. Joonwhoan, “Geometric feature-based facial expression recognition in image sequences using multi-class.” Ada Boost and support vector machines. Sensors Vol. 13, No. 6, 2013, pp. 7714 –7734.

J. Mira, K. Byoung Chul, N. Jae Yeal, “Facial landmark detection based on an ensemble of local weighted regressors during real driving situation." International Conference on Pattern Recognition (ICPR), 2016, pp. 2198-2203.

J. Mira, K. Byoung Chul, K. Sooyeong, N. JaeYeal, "Driver facial landmark detection in real driving situations. " IEEE Trans Circuits System Video Technology, Vol. 28, No. 10, 2018, pp. 2753-2767.

https://doi.org/10.1109/TCSVT.2017.2769096

R. A. Khan, A. Meyer, H. Konik, S. Bouakaz, “Framework for reliable, real-time facial expression recognition for low resolution images.” Pattern Recognition Letter, Vol. 34, No. 10, 2013, pp. 1159-1168.

https://doi.org/10.1016/j.patrec.2013.03.022

M. H. Siddiqi, R. Ali, A. M. Khan, Y. T. Park, S. Lee, “Human facial expression recognition using step wise linear discriminant analysis and hidden conditional random fields.” IEEE Transactions on Image Processing,

Vol. 24, No. 4, 2015, pp. 1386–1398. https://doi.org/10.1109/TIP.2015.2405346

D. Ghimire, S. Jeong, J. Lee, S. H. Park, “Facial expression recognition based on local region specific features and support vector machines." Multimedia Tools Applications, Vol. 76, No. 6, 2017, pp. 7803–7821. https://doi.org/10.1007/s11042-016-3418-y

S. L. Happy, A. George, A. Routray, “A real time facial expression classification system using local binary patterns.” IEEE Proceedings of 4th International Conference on Intelligent Human Computer Interaction, 2012, pp. 1-5.

B. Hasani, M. H. Mahoor, “Facial expression recognition using enhanced deep 3D convolutional neural networks.” IEEE Conference, 2017, pp. 30-40.

Downloads

Published

2021-06-30

How to Cite

Arnold Sachith A Hans, & Smitha Rao. (2021). A CNN-LSTM BASED DEEP NEURAL NETWORKS FOR FACIAL EMOTION DETECTION IN VIDEOS. INTERNATIONAL JOURNAL OF ADVANCES IN SIGNAL AND IMAGE SCIENCES, 7(1), 11–20. https://doi.org/10.29284/ijasis.7.1.2021.11-20

Issue

Section

Articles