학술논문

Enforcing Semantic Consistency for Cross Corpus Emotion Prediction Using Adversarial Discrepancy Learning in Emotion
Document Type
Periodical
Source
IEEE Transactions on Affective Computing IEEE Trans. Affective Comput. Affective Computing, IEEE Transactions on. 14(2):1098-1109 Jun, 2023
Subject
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Databases
Semantics
Emotion recognition
Acoustic distortion
Training
Nonlinear distortion
Correlation
Speech emotion recognition
generative adversarial network
cross corpus learning
semantic consistency
domain adaptation
Language
ISSN
1949-3045
2371-9850
Abstract
Mismatch between databases entails a challenge in performing emotion recognition on a practical-condition unlabeled database with labeled source data. The alignment between the source and target is crucial for conventional neural network; therefore, many studies have mapped two domains in a common feature space. However, the effect of distortion in emotion semantics across different conditions has been neglected in such work, and a sample from the target may be considered a high emotional annotation in the target but as low in the source. In this article, we propose the maximum regression discrepancy (MRD) network, which enforces semantic consistency in a source and target by adjusting the acoustic feature encoder to minimize discrepancy in maximally distorted samples through adversarial training. We show our framework in several experiments using three databases (the USC IEMOCAP, MSP-Improv, and MSP-Podcast) for cross corpus emotion prediction. Compared to the Source-only neural network and DANN, MRD network demonstrates a significant improvement between 5% and 10% in the concordance correlation coefficient (CCC) in cross-corpus prediction and between 3% and 10% for evaluation on MSP-PODCAST. We also visualize the effect of MRD on feature representation to shows the efficacy of the MRD structure we designed.