학술논문

Multibranch Joint Representation Learning Based on Information Fusion Strategy for Cross-View Geo-Localization
Document Type
Periodical
Source
IEEE Transactions on Geoscience and Remote Sensing IEEE Trans. Geosci. Remote Sensing Geoscience and Remote Sensing, IEEE Transactions on. 62:1-16 2024
Subject
Geoscience
Signal Processing and Analysis
Feature extraction
Task analysis
Representation learning
Location awareness
Deep learning
Context modeling
Layout
Geo-localization
hybrid information fusion strategies (IFSs)
joint representation learning
multibranch
Language
ISSN
0196-2892
1558-0644
Abstract
Cross-view geo-localization refers to recognizing images of the same geographic target obtained from different platforms (such as drone-view, satellite-view, and ground-view). However, cross-view geo-localization is challenging as image capture using different platforms coupled with extreme viewpoint variations can cause significant changes to the visual image content. Existing methods mainly focus on mining the fine-grained features or the contextual information in neighboring areas, but ignore the complete information of the entire image and the association of contextual information of adjacent regions. Therefore, a multibranch joint representation learning network model based on information fusion strategies (IFSs) is proposed to solve this cross-view geo-localization problem. First, we obtained feature information from the image through global information fusion (GIF) branch and local information fusion (LIF) branch to help the network learn the discernable information in the different images. In addition, a local-guided-GIF (LGGIF) branch is introduced to make local information assist global features to enhance the learning of potential information in the images. Second, we introduced different IFSs in each branch to increase the extraction of contextual information through expanding the global receptive field, thus improving the performance of the model. Finally, a series of experiments is carried out on four prevailing benchmark datasets, namely University-1652, SUES-200, CVUAS, and CVACT datasets. The quantitative comparisons from the experiments clearly indicate that the proposed network framework has great performance. For example, compared with some state-of-the-art methods, the quantitative improvements of the R@1 and AP on the University-1652 datasets are 1.91%, 2.18%, and 1.55%, 2.99% in both tasks, respectively.