학술논문

Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks
Document Type
Original Paper
Source
Signal, Image and Video Processing. 18(8-9):5579-5588
Subject
Hand gesture segmentation
Feature fusion
Transformer
Multi-scale
Language
English
ISSN
1863-1703
1863-1711
Abstract
Hand gesture segmentation is an important research topic in computer vision. Despite ongoing efforts, achieving optimal gesture segmentation remains challenging, attributed to factors like gesture morphology and intricate backgrounds. In light of these challenges, we propose a novel hand gesture segmentation approach that strategically combines the strengths of Convolutional Neural Networks (CNN) for local feature extraction and Transformer Networks for global feature integration. To be more specific, we design two feature fusion modules. One employs an attention mechanism to learn how to fuse features extracted by CNN and Transformer. The second module utilizes a combination of group convolution and activation functions to implement gating mechanisms, enhancing the response of crucial features while minimizing interference from weaker ones. Our proposed method achieves mIoU score of 93.53%, 97.25%, and 90.39% on OUHANDS, HGR1, and EgoHands hand gesture datasets respectively, which outperforms the state-of-the-art methods.