학술논문

Statistical and Deep Convolutional Feature Fusion for Emotion Detection from Audio Signal
Document Type
Conference
Source
2023 International Conference on Bio Signals, Images, and Instrumentation (ICBSII) Bio Signals, Images, and Instrumentation (ICBSII), 2023 International Conference on. :1-7 Mar, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Signal Processing and Analysis
Support vector machines
Training
Measurement
Emotion recognition
Time-frequency analysis
Sentiment analysis
Feature extraction
Audio Emotion Classification (AEP)
MultiLayer Perceptron (MLP)
Mel Frequency Cepstral Coefficient (MFCC)
Visual Geometry Group19 (VGG19)
Convolutional Neural Network(CNN)
Language
ISSN
2768-6450
Abstract
Speech serves as a crucial mode of expression for individuals to articulate their thoughts and can offer valuable insight into their emotional state. Various research has been conducted to identify metrics that can be used to determine the emotional sentiment hidden in an audio signal. This paper presents an exploratory analysis of various audio features, including Chroma features, MFCCs, Spectral features, and flattened spectrogram features (obtained using VGG-19 convolutional neural network) for sentiment analysis in the audio signals. This study evaluates the effectiveness of combining various audio features in determining emotional states expressed in a speech using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Baseline techniques such as Random Forest, Multi-Layer Perceptron (MLP), Logistic Regression, XgBoost, and Support Vector Machine (SVM) are used to compare the performance of the features. The results obtained from the study provide insight into the potential of utilizing these audio features to determine emotional states expressed in speech.