학술논문

Landmark-Aware and Part-Based Ensemble Transfer Learning Network for Static Facial Expression Recognition from Images
Document Type
Periodical
Source
IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 4(2):349-361 Apr, 2023
Subject
Computing and Processing
Face recognition
Transfer learning
Task analysis
Faces
Computational modeling
Training
Location awareness
Convolutional neural network (CNN)
deep learning
ensemble network
facial expression recognition (FER)
facial landmarks localization (FLL)
gradient-weighted class activation mapping (grad-CAM)
transfer learning (TL)
Language
ISSN
2691-4581
Abstract
Facial expression recognition from images is a challenging problem in computer vision applications. Convolutional neural network (CNN), the state-of-the-art method for various computer vision tasks, has had limited success in predicting expressions from faces having extreme poses, illumination, and occlusion conditions. To mitigate this issue, CNNs are often accompanied by techniques like transfer, multitask, or ensemble learning that provide high accuracy at the cost of increased computational complexity. In this article, the authors propose a part-based ensemble transfer learning network that models how humans recognize facial expressions by correlating visual patterns emanating from facial muscles’ motor movements with a specific expression. The proposed network performs transfer learning from facial landmark localization to facial expression recognition. It consists of five subnetworks, and each subnetwork performs transfer learning from one of the five subsets of facial landmarks: eyebrows, eyes, nose, mouth, or jaw to expression classification. The network’s performance is evaluated using the Cohn-Kanade (CK+), Japanese female facial expression (JAFFE), and static facial expressions in the wild datasets, and it outperforms the benchmark for CK+ and JAFFE datasets by 0.51% and 5.34%, respectively. Additionally, the proposed ensemble network consists of only 1.65 M model parameters, ensuring computational efficiency during training and real-time deployment. Gradient-weighted class activation mapping visualizations of the network reveal the complementary nature of its subnetworks, a key design parameter of an effective ensemble network. Lastly, cross-dataset evaluation results show that the the proposed ensemble has a high generalization capacity, making it suitable for real-world usage.