학술논문

Learning Implicit Depth Information for Monocular 3D Object Detection
Document Type
Conference
Source
2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), 2022 International Conference on. :1-7 Nov, 2022
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
General Topics for Engineers
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Training
Three-dimensional displays
Object detection
Computer architecture
Detectors
Cameras
Solids
3D object detection
monocular depth estimation
autonomous driving
deep learning
Language
Abstract
Detecting 3D traffic participants in the surrounding of an autonomous vehicle is a challenging task. But as long as the vehicle is equipped with expensive light detection and ranging (LiDAR) sensors and the weather conditions are sufficient, the task seems to be easily solvable. As soon as one of these preconditions is no longer satisfied, camera sensors become an important factor. Obtaining 3D spatial information from a 2D image is a challenging task and most 3D object detectors do not deliver adequate results as long as they solely depend on camera images. For this reason, more input information needs to be provided to such systems. But as long as there are no other sensors available, it is not trivial to gain additional information. As depth reconstruction from monocular images started to provide solid results, camera based 3D detection algorithms began to use these generated depth maps as additional input information. However, this comes at the cost of an additional algorithm that requires valuable on-board resources. To circumvent this issue, we propose an architecture that learns the concept of depth during training based on additional ground truth. In operational mode, the 3D object detector no longer relies on depth maps as input. On the KITTI 3D object detection dataset the proposed architecture achieves comparable results to architectures using additional depth inputs and outperforms image only methods.