학술논문

BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements
Document Type
Conference
Source
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) WACV Applications of Computer Vision (WACV), 2024 IEEE/CVF Winter Conference on. :7899-7909 Jan, 2024
Subject
Computing and Processing
Visualization
Computer vision
Pulse measurements
Computational modeling
Computer architecture
Licenses
Multitasking
Applications
Biomedical / healthcare / medicine
Remote Sensing
Language
ISSN
2642-9381
Abstract
Understanding of human visual perception has historically inspired the design of computer vision architectures. As an example, perception occurs at different scales both spatially and temporally, suggesting that the extraction of salient visual information may be made more effective by attending to specific features at varying scales. Visual changes in the body, due to physiological processes, also occur at varying scales and with modality-specific characteristic properties. Inspired by this, we present BigSmall, an efficient architecture for physiological and behavioral measurement. We present the first joint camera-based facial action, cardiac, and pulmonary measurement model. We propose a multi-branch network with wrapping temporal shift modules that yields efficiency gains and accuracy on par with task-optimized methods. We observe that fusing low-level features leads to suboptimal performance, but that fusing high level features enables efficiency gains with negligible losses in accuracy. We experimentally validate that BigSmall significantly reduces computational cost while achieving comparable results on multiple physiological measurement tasks simultaneously with a unified model.