학술논문

Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization
Document Type
Periodical
Source
IEEE/CAA Journal of Automatica Sinica IEEE/CAA J. Autom. Sinica Automatica Sinica, IEEE/CAA Journal of. 9(2):313-328 Feb, 2022
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Robotics and Control Systems
Location awareness
Visualization
Measurement
Feature extraction
Image recognition
Pipelines
Training
Deep representation learning
place recognition
visual localization
Language
ISSN
2329-9266
2329-9274
Abstract
Visual localization is a crucial component in the application of mobile robot and autonomous driving. Image retrieval is an efficient and effective technique in image-based localization methods. Due to the drastic variability of environmental conditions, e.g., illumination changes, retrieval-based visual localization is severely affected and becomes a challenging problem. In this work, a general architecture is first formulated probabilistically to extract domain-invariant features through multi-domain image translation. Then, a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. We also propose a new adaptive triplet loss to boost the contrastive learning of the embedding in a self-supervised manner. The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models with and without Grad-SAM loss. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset. The strong generalization ability of our approach is verified with the RobotCar dataset using models pre-trained on urban parts of the CMU-Seasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision, especially under challenging environments with illumination variance, vegetation, and night-time images. Moreover, real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization.