학술논문

Multiscale Cross-Modal Homogeneity Enhancement and Confidence-Aware Fusion for Multispectral Pedestrian Detection
Document Type
Periodical
Source
IEEE Transactions on Multimedia IEEE Trans. Multimedia Multimedia, IEEE Transactions on. 26:852-863 2024
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Feature extraction
Lighting
Proposals
Task analysis
Convolutional neural networks
Training
Sun
Confidence measurement fusion
confidence transfer loss
cross-modal homogeneity enhancement
multispectral pedestrian detection
Language
ISSN
1520-9210
1941-0077
Abstract
Multispectral pedestrian detection has shown many advantages in a variety of environments, particularly poor illumination conditions, by leveraging visible-thermal modalities. However, in-depth insight into distinguishing the complementary content of multimodal data and exploring the extent of multimodal feature fusion is still lacking. In this paper, we propose a novel multispectral pedestrian detector with multiscale cross-modal homogeneity enhancement and confidence-aware feature fusion. RGB and thermal streams are constructed to extract features and generate candidate proposals. During feature extraction, multiscale cross-modal homogeneity enhancement is proposed to enhance single-modal features using the separated homogeneous features via modal interactions. Homogeneity features encode the semantic information of the scene and are extracted from the RGB-thermal pairs by employing a channel attention mechanism. Proposals from two modalities are united to obtain multimodal proposals. Then, confidence measurement fusion is proposed to achieve multispectral feature fusion in each proposal by measuring the internal confidence of each modality and the interaction confidence between modalities. In addition, a confidence transfer loss function is designed to focus more on hard-to-detect samples during training. Experimental results on two challenging datasets demonstrate that the proposed method achieves better performance compared to existing methods.