학술논문

Homogeneous and Heterogeneous Optimization for Unsupervised Cross-Modality Person Reidentification in Visual Internet of Things
Document Type
Periodical
Author
Source
IEEE Internet of Things Journal IEEE Internet Things J. Internet of Things Journal, IEEE. 11(7):12165-12176 Apr, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Pedestrians
Feature extraction
Visualization
Internet of Things
Optimization
Data mining
Hafnium
Cross-modality person reidentification (ReID)
style adaptation
unsupervised learning
Language
ISSN
2327-4662
2372-2541
Abstract
Cross-modality visible-infrared person reidentification (VI-ReID) has attracted widespread concern due to its scalability in 24-h video surveillance of the Visual Internet of Things (VIoT). Driven by enough annotated training data, supervised VI-ReID has achieved superior performance. However, annotating a large amount of cross-modality data is extremely time-consuming, which limits its employment in real-world scenarios. Existing several works neglect the image-level discrepancy and could not obtain reliable feature-level heterogeneous correlation. In this article, we propose a novel homogeneous and heterogeneous optimization with modality style adaptation (HHO) mechanism to eliminate intramodality and intermodality discrepancies without any label information for unsupervised VI-ReID. Specifically, we present the modality style adaptation strategy to transfer unlabeled cross-modality pedestrian styles, which not only increases the image diversity but also bridges the intermodality gap. Meanwhile, we employ the clustering algorithm to generate pseudo labels for each modality. The homogeneous feature optimization is developed to extract intramodality pedestrian features. Furthermore, we propose heterogeneous feature optimization to eliminate the intermodality discrepancy. To this end, a heterogeneous feature search (HFS) module is designed to mine reliable cross-modality signals for each identity. These reliable heterogeneous features are constrained to generate the compact feature distribution, while different identities are forced to be separated. The HHO are seamlessly integrated to learn cross-modality robust features. Abundant experiments prove the superiority of HHO, which gains superior performance.