학술논문

CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments
Document Type
Periodical
Source
IEEE Transactions on Industrial Informatics IEEE Trans. Ind. Inf. Industrial Informatics, IEEE Transactions on. 20(3):4039-4048 Mar, 2024
Subject
Power, Energy and Industry Applications
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
Feature extraction
Nonhomogeneous media
Laser radar
Fuses
Transformers
Point cloud compression
Autonomous vehicles
Autonomous driving
LiDAR place recognition (LPR)
multiview fusion
transformer network
Language
ISSN
1551-3203
1941-0050
Abstract
LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this article, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views and bird's eye views generated from the LiDAR data. It extracts correlations within the views using intratransformers and between the two different views using intertransformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the prebuilt database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has better real-time performance that can run faster than the typical LiDAR frame rate does.