학술논문

A Training-Free, Lightweight Global Image Descriptor for Long-Term Visual Place Recognition Toward Autonomous Vehicles
Document Type
Periodical
Source
IEEE Transactions on Intelligent Transportation Systems IEEE Trans. Intell. Transport. Syst. Intelligent Transportation Systems, IEEE Transactions on. 25(2):1291-1302 Feb, 2024
Subject
Transportation
Aerospace
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Semantics
Visualization
Feature extraction
Task analysis
Autonomous vehicles
Skeleton
Aggregates
Visual place recognition
loop detection
SLAM
semantic understanding
Language
ISSN
1524-9050
1558-0016
Abstract
Long-term visual place recognition (VPR) has recently become a popular research topic in the field of autonomous driving. In urban scenarios, variations in scene appearance due to the change in seasons and illumination bring great challenges for scene description. Several learning-based VPR techniques can learn latent invariant descriptors for appearance variations and show excellent performance in long-term VPR tasks. However, these methods require huge datasets and computational resources (e.g., GPUs) for training and inference. Mobile platforms such as autonomous vehicles often cannot provide sufficient computing power. To address this issue, in this paper, a training-free lightweight global image descriptor named SSR-VLAD is proposed for VPR. This descriptor is able to work accurately in real-time without GPUs, even on embedded platforms. The contribution of this work has two aspects. (1) A novel semantic skeleton representation (SSR) is proposed to describe the semantic spatial distribution of scenes by using the semantic spatial context; (2) Inspired by the Vector of Locally Aggregated Descriptors (VLAD), a spatial-temporal aggregation framework for SSR features is constructed to aggregate all SSR features into one SSR-VLAD descriptor, which encodes the spatial and temporal information into a fixed-size global descriptor. SSR-VLAD shows robust performance towards the appearance variations of scenes. Specifically, on three public datasets with challenging urban scenes, experimental results show that SSR-VLAD has competitive VPR performance compared to several state-of-the-art (SoTA) VPR methods. Additionally, SSR-VLAD achieves SoTA real-time computational performance with lower RAM consumption in computationally constrained scenarios.