학술논문

DIMGNet: A Transformer-Based Network for Pedestrian Reidentification With Multi-Granularity Information Mutual Gain
Document Type
Periodical
Source
IEEE Transactions on Multimedia IEEE Trans. Multimedia Multimedia, IEEE Transactions on. 26:6513-6528 2024
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Pedestrians
Feature extraction
Transformers
Computer architecture
Task analysis
Cameras
Data mining
Cross-attention mechanism
information mutual gain
pedestrian reidentification (ReID)
transformer
Language
ISSN
1520-9210
1941-0077
Abstract
Pedestrian reidentification (ReID) is a challenging task that involves identifying and retrieving specific pedestrians across different cameras and scenes. This problem has significant implications for security surveillance, and has thus received substantial attention in recent years. However, traditional convolutional neural networks (CNNs) have limited receptive fields and cannot capture global information. Moreover, transformer networks, which excel in long-range feature capture, are prone to accuracy degradation due to loss of details. To address these limitations, we propose a transformer-based pedestrian ReID network with double-branch information mutual gain (DIMGNet), which leverages hierarchical parallel levels to support multi-granularity feature information mutual gain. Our model also incorporates an auxiliary camera information (ACI) module to improve feature representation ability. We further embed a cross-attention mechanism into the architecture to enhance mutual gain between multi-granularity features and improve feature discrimination. Finally, we introduce a shuffling technique to increase the robustness of the extracted features. We evaluate the proposed method on several benchmark datasets, including Market-1501 (Zhou et al., 2022), MSMT17 (Wei et al., 2018), DukeMTMC-reID (Ristani et al., 2016), and Occluded-Duke (Miao et al., 2019), achieving $mAP$ values of 90.7%, 68.4%, 83.7%, and 60.6%, respectively. Our method outperforms most state-of-the-art methods, demonstrating the effectiveness of our method.