학술논문

Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?
Document Type
Periodical
Source
IEEE Transactions on Affective Computing IEEE Trans. Affective Comput. Affective Computing, IEEE Transactions on. 15(2):535-548 Jun, 2024
Subject
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Face recognition
Three-dimensional displays
Shape
Solid modeling
Gold
Task analysis
Computational modeling
Facial expression analysis
dimensional affect recognition
action unit intensity estimation
3D morphable models
Language
ISSN
1949-3045
2371-9850
Abstract
Recognising continuous emotions and action unit (AU) intensities from face videos, requires a spatial and temporal understanding of expression dynamics. Existing works primarily rely on 2D face appearance features to extract such dynamics. This work focuses on a promising alternative based on parametric 3D face alignment models, which disentangle different factors of variation, including expression-induced shape variations. We aim to understand how expressive 3D face shapes are in estimating valence-arousal and AU intensities compared to the state-of-the-art 2D appearance-based models. We benchmark five recent 3D face models: ExpNet, 3DDFA-V2, RingNet, DECA, and EMOCA. In valence-arousal estimation, expression features of 3D face models consistently surpassed previous works and yielded an average concordance correlation of. 745 and. 574 on SEWA and AVEC 2019 CES corpora, respectively. We also study how 3D face shapes performed on AU intensity estimation on BP4D and DISFA datasets, and report that 3D face features were on par with 2D appearance features in recognising AUs 4, 6, 10, 12, and 25, but not the entire set of AUs. To understand this discrepancy, we conduct a correspondence analysis between valence-arousal and AUs, which points out that accurate prediction of valence-arousal may require the knowledge of only a few AUs.