학술논문

Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning

Document Type

Periodical

Author

Hoang, D.; Tan, P.X.; Nguyen, A.; Vu, D.; Vu, V.; Nguyen, T.; Hoang, N.; Phan, K.; Tran, D.; Nguyen, V.; Duong, Q.; Ho, N.; Tran, C.; Duong, V.; Ngo, P.

Source

IEEE Access Access, IEEE. 12:54339-54351 2024

Subject

Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Feature extraction
Three-dimensional displays
Shape
Pose estimation
Image color analysis
Task analysis
Solid modeling
robot vision systems
intelligent systems
deep learning
supervised learning
machine vision

Language

ISSN

2169-3536

Abstract

Hand-object configuration recovery is an important task in computer vision. The estimation of pose and shape for both hands and objects during interactive scenarios has various applications, particularly in augmented reality, virtual reality, or imitation-based robot learning. The problem is particularly challenging when the hand is interacting with objects in the environment, as this setting features both extreme occlusions and non-trivial shape deformations. While existing works treat the problem of estimating hand configurations (that is pose and shape parameters) in isolation from the recovery of parameters related to the object acted upon, we stipulate that the two problems are related and can be solved more accurately concurrently. We introduce an approach that jointly learns the features of hand and object from color and depth (RGB-D) images. Our approach fuses appearance and geometric features in an adaptive manner which allows us to accent or suppress features that are more meaningful for the upstream task of hand-object configuration recovery. We combine a deep Hough voting strategy that builds on our adaptive features with a graph convolutional network (GCN) to learn the interaction relationships between the hand and held object shapes during interaction. Experimental results demonstrate that our proposed approach consistently outperforms state-of-the-art methods on popular datasets.

Online Access

Open Access (EBSCO) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송