학술논문

Improving 3D Pose Estimation For Sign Language
Document Type
Conference
Source
2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 2023 IEEE International Conference on. :1-5 Jun, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Visualization
Solid modeling
Three-dimensional displays
Pose estimation
Neural networks
Gesture recognition
Kinematics
3D pose estimation
hand and body reconstruction
Language
Abstract
This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using neural networks to predict both the joint rotations and bone lengths. These predictions are then combined with skeletal constraints using an FK layer implemented as a network layer in PyTorch. The result is a fast and accurate approach to the estimation of 3D skeletal pose. Through quantitative and qualitative evaluation, we demonstrate the method is significantly more accurate than MediaPipe in terms of both per joint positional error and visual appearance. Furthermore, we demonstrate generalization over different datasets and sign languages. The implementation in PyTorch runs at between 100-200 milliseconds per image (including CNN detection) using CPU only.