학술논문

Multimodal Engagement Analysis From Facial Videos in the Classroom
Document Type
Periodical
Source
IEEE Transactions on Affective Computing IEEE Trans. Affective Comput. Affective Computing, IEEE Transactions on. 14(2):1012-1027 Jun, 2023
Subject
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Observers
Magnetic heads
Feature extraction
Videos
Psychology
Computer vision
Affective computing
computer vision
educational technology
nonverbal behaviour understanding
Language
ISSN
1949-3045
2371-9850
Abstract
Student engagement is a key component of learning and teaching, resulting in a plethora of automated methods to measure it. Whereas most of the literature explores student engagement analysis using computer-based learning often in the lab, we focus on using classroom instruction in authentic learning environments. We collected audiovisual recordings of secondary school classes over a one and a half month period, acquired continuous engagement labeling per student (N=15) in repeated sessions, and explored computer vision methods to classify engagement from facial videos. We learned deep embeddings for attentional and affective features by training Attention-Net for head pose estimation and Affect-Net for facial expression recognition using previously-collected large-scale datasets. We used these representations to train engagement classifiers on our data, in individual and multiple channel settings, considering temporal dependencies. The best performing engagement classifiers achieved student-independent AUCs of .620 and .720 for grades 8 and 12, respectively, with attention-based features outperforming affective features. Score-level fusion either improved the engagement classifiers or was on par with the best performing modality. We also investigated the effect of personalization and found that only 60 seconds of person-specific data, selected by margin uncertainty of the base classifier, yielded an average AUC improvement of .084.