학술논문

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation
Document Type
Periodical
Source
IEEE Transactions on Multimedia IEEE Trans. Multimedia Multimedia, IEEE Transactions on. 26:1158-1168 2024
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Laser radar
Cameras
Point cloud compression
Semantics
Semantic segmentation
Three-dimensional displays
Synchronization
LiDAR and Camera
LiDAR segmentation
contextual information
weak spatiotemporal synchronization
Language
ISSN
1520-9210
1941-0077
Abstract
Camera and 3D LiDAR sensors have become indispensable devices in modern autonomous driving vehicles. Camera provides fine-grained texture and color information in 2D space, while LiDAR captures more precise and farther-away distance measurements of the surrounding environments. The complementary information from these two sensors makes the fusion of two modalities a desired option. However, two primary challenges in the fusion of camera and LiDAR hinder its performance, i.e., how to effectively fuse the information from these two modalities and how to precisely align them (suffering from the weak spatiotemporal synchronization problem). This article proposes a coarse-to-fine LiDAR and camera fusion-based network, named LIF-Seg, for LiDAR segmentation. For the first challenge, unlike these previous works fusing the point cloud and image information in a one-to-one manner, the proposed method introduces a simple but effective early-fusion strategy to fully utilize the contextual information of images. Second, to tackle the weak spatiotemporal synchronization problem, an offset rectification approach is designed to align the features of the two modalities. The cooperation of these two components leads to the success of the effective camera-LiDAR fusion. Experimental results on the nuScenes dataset show the superiority of LIF-Seg over existing methods by a large margin. Ablation studies and analyses further illustrate that the LIF-Seg can effectively address the weak spatiotemporal synchronization problem.