학술논문

Deep Neural Networks for Comprehensive Multimodal Emotion Recognition
Document Type
Conference
Source
2023 International Conference on Disruptive Technologies (ICDT) Disruptive Technologies (ICDT), 2023 International Conference on. :462-466 May, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Training
Visualization
Emotion recognition
Machine learning algorithms
Neural networks
Feature extraction
Data models
LSTM
deep learning
learning end to end
Convolution Neural Network
Language
Abstract
Emotions may be expressed in many different ways, making automatic affect recognition challenging. Several industries may benefit from this technology, including audiovisual search and human- machine interface. Recently, neural networks have been developed to assess emotional states with unprecedented accuracy. We provide an approach to emotion identification that makes use of both visual and aural signals. It’s crucial to isolate relevant features in order to accurately represent the nuanced emotions conveyed in a wide range of speech patterns. We do this by using a Convolutional Neural Network (CNN) to parse the audio track for feature extraction and a 50-layer deep ResNet to process the visual track. Machine learning algorithms, in addition to needing to extract the characteristics, should also be robust against outliers and reflective of their surroundings. To solve this problem, LSTM networks are used. We train the system from the ground up, using the RECOLA datasets from the AVEC 2016 emotion recognition research challenge, and we demonstrate that our method is superior to prior approaches that relied on manually constructed aural and visual cues for identifying genuine emotional states. It has been demonstrated that the visual modality predicts valence more accurately than arousal. The best results for the valence dimension from the RECOLA dataset are shown in Table III below.