학술논문

Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSA
Document Type
Original Paper
Source
The Visual Computer: International Journal of Computer Graphics. :1-14
Subject
Multi-head self-attention
GCN
3D hand reconstruction
Language
English
ISSN
0178-2789
1432-2315
Abstract
Recently, graph convolution networks have become the mainstream methods in 3D hand pose and mesh estimation, but there are still some issues hindering its further development. First, the way that previous researchers alleviated small receptive field of vanilla graph convolution by simply stacking multiple GCN layers might lead to over-smoothness of features, thereby misleading the hand pose estimation. Second, most attempts directly reconstructed hand mesh from 3D pose in one step, which ignored the significant gap between sparse pose and dense mesh, resulting in incorrect results and unstable training. To solve these issues, a novel framework integrating multi-head self-attention, spatial-based graph convolution and spectral-based graph convolution for 3D hand pose and mesh estimation is proposed. The proposed framework comprises of two main modules: SemGraAttention and ChebGconv blocks. The SemGraAttention enables all hand joints to interact in global field without weakening the topologies of hand. As a complementary, the ChebGconv formulates implicit semantic relations among joints to further boost performance. In addition, a coarse-to-fine strategy is adopted to reconstruct dense hand mesh from sparse pose step by step, which contributes to refined results and stable training. The extensive evaluations on multiple 3D benchmarks demonstrate that our model outperforms a series of 3D hand pose and mesh estimation approaches.