학술논문

Understanding 3D Semantic Structure around the Vehicle with Monocular Cameras
Document Type
Conference
Source
2018 IEEE Intelligent Vehicles Symposium (IV) Intelligent Vehicles Symposium (IV), 2018 IEEE. :132-137 Jun, 2018
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Cameras
Semantics
Three-dimensional displays
Image segmentation
Estimation
Task analysis
Two dimensional displays
Language
Abstract
In this paper, we propose a method to recognize semantic and geometric structure of a traffic scene using monocular cameras. We designed Deep Neural Networks (DNNs) for semantic segmentation and depth estimation and trained them using data collected with a test vehicle, on top of which a 360degree panoramic camera system and a LIDAR are mounted. Collected images were manually annotated for semantic segmentation. Experimental results show that the trained DNNs can accurately classify each pixel and also accurately estimate depth of each pixel of images in validation data. Global average of semantic segmentation reached 96.4%, while overall accuracy of depth estimation was 88.7%. Generalization capability for both tasks was also tested with DNNs trained only with front facing camera images, resulting that semantic segmentation and depth estimation were successfully executed at slightly less accuracy. We also developed a novel interface using a head mount display, that enables us to evaluate results of estimation intuitively for checking how well the estimation of proposed DNNs is.