학술논문

MonoIS3DLoc: Simulation to Reality Learning Based Monocular Instance Segmentation to 3D Objects Localization From Aerial View
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:64170-64184 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Three-dimensional displays
Location awareness
Cameras
Solid modeling
Drones
Neural networks
Deep learning
Monocular camera
Sim2Real
3D objects localization
deep learning
aerial view
Language
ISSN
2169-3536
Abstract
3D object detection and localization based on only a monocular camera always faces its fundamental ill-posed issue to estimate 3D information. In combination with deep neural networks, recent researches have shown encouraging results to tackle this issue. However, most of them are only applied to street-view cameras based on several available small-size datasets and the 3D prediction accuracy of these methods is still low in comparison to traditional estimation methods using stereo-cameras. With the development of drone delivery applications in city spaces, it is also necessary to have a similar method to detect objects and estimate their 3D position from an aerial view. We proposed a novel Simulation to Reality approach to predict the object’s 3D position from an aerial view. An instance segmentation of an object is used as an intermediate representation not only to create a very large dataset for training by simulation but also to minimize the gap between simulation and reality. We designed a feed-forward neural network to predict the 3D position from instance segmentation and integrated it with a range-attention classification to improve accuracy, especially for 3D object detection at far distances. To evaluate our methods, we created two simulation datasets: one for cross-validation with other state-of-the-art methods and the other one for practical experiments on a real drone with a monocular camera. The experiment’s results demonstrate that we not only achieve better accuracy than the state-of-the-art methods using the monocular camera by testing on the same KITTI-3D dataset but also reach close to the accuracy of a stereo-based technique. Since our model is lightweight, we successfully deployed it on a companion computer on the real drone and the results of practical experiences are promising.