학술논문

Facial Video-Based Remote Physiological Measurement via Self-Supervised Learning
Document Type
Periodical
Author
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 45(11):13844-13859 Nov, 2023
Subject
Computing and Processing
Bioengineering
Videos
Frequency estimation
Physiology
Loss measurement
Training
Skin
Faces
Remote physiological measurement
self-supervised learning
frequency augmentation
local rPPG expert
frequency-inspired losses
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human facial videos and then measure multiple vital signs (e.g., heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is not easy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from facial videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation; negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregates them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e., frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples. We conduct rPPG-based heart rate, heart rate variability, and respiration frequency estimation on five standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin.