학술논문

InfPose: Real-Time Infrared Multi-Human Pose Estimation for Edge Devices Based on Encoder-Decoder CNN Architecture
Document Type
Periodical
Source
IEEE Robotics and Automation Letters IEEE Robot. Autom. Lett. Robotics and Automation Letters, IEEE. 9(4):3672-3679 Apr, 2024
Subject
Robotics and Control Systems
Computing and Processing
Components, Circuits, Devices and Systems
Pose estimation
Decoding
Convolution
Real-time systems
Performance evaluation
Image resolution
Head
Deep learning for visual perception
gesture
posture and facial expressions
multi-human pose estimation
infrared vision
Language
ISSN
2377-3766
2377-3774
Abstract
Despite its remarkable performance, RGB-based Multi-human Pose Estimation (MPE) technology has many practical limitations, such as nighttime and smoggy environments. Infrared imaging is a valid substitution in these scenarios but needs an efficient and fast method for MPE. This letter aims to design an infrared MPE model based on the Encoder-Decoder CNN architecture, InfPose, which can perform real-time on edge devices. We first built a lightweight Encoder-Decoder CNN backbone based on hardware-friendly inverted residual blocks. Secondly, we utilized three methods to improve the capability of InfPose, including decoupling associative embedding head, multi-scale supervision, and cross-modal knowledge distillation. In addition, we gathered a wild infrared human pose dataset to train and evaluate our methods. Experiment results show that the proposed model is more robust and has less latency when inference on edge GPU platforms in comparison to the prevailing mainstream models. The inference time for InfPose on Xavier NX was recorded as 27.7 ms ($\approx$37 fps) and maintained sufficient accuracy for use. This research can be applied to human-machine interaction of autonomous vehicles or intelligent robots in nighttime or other scenes with poor visibility.