학술논문

SA-BiGCN: Bi-Stream Graph Convolution Networks With Spatial Attentions for the Eye Contact Detection in the Wild
Document Type
Periodical
Source
IEEE Transactions on Intelligent Transportation Systems IEEE Trans. Intell. Transport. Syst. Intelligent Transportation Systems, IEEE Transactions on. 25(2):2089-2100 Feb, 2024
Subject
Transportation
Aerospace
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Joints
Bones
Feature extraction
Task analysis
Data models
Pedestrians
Data mining
Eye contact detection
skeleton graph
graph convolution networks
spatial attention
Language
ISSN
1524-9050
1558-0016
Abstract
Eye contact is essential in transmitting information and intention in the wild environment (e.g., urban streets or parking lots) with mixed vehicles and pedestrians. Compared with the vision image data, the human skeleton data are deemed to be robust to unconstrained surroundings and illumination. However, the skeleton graph-based approaches are mainly used for the action recognition. It is challenging to directly apply them to the eye detection task, which is momentary and dynamic given the complex wild environment. This paper proposes a Bi-stream Spatial Attention Graph Convolution Network (SA-BiGCN) for eye contact detection in the wild. We design a directed, nose-centric skeleton graph to capture relevant and hierarchical information and their interactions. We also propose a Bi-stream graph convolution network model with spatial attention to dynamically extract and fuse skeleton joints and bones information. The model was validated by comparing with state-of-art models on three large-scale public datasets, including JAAD, PIE, and LOOK. The results highlight the accuracy and generalization performance of the proposed SA-BiGCN model in detecting the eye contact in the wild environment. The ablation analysis validates the importance of the skeleton graph design, the spatial attention mechanism in the feature fusion process, as well as the model robustness against noisy skeleton data in terms of part occlusions, block occlusions, random occlusions, and random deviations.