학술논문

3-D Semantic Terrain Reconstruction of Monocular Close-Up Images of Martian Terrains
Document Type
Periodical
Source
IEEE Transactions on Geoscience and Remote Sensing IEEE Trans. Geosci. Remote Sensing Geoscience and Remote Sensing, IEEE Transactions on. 62:1-16 2024
Subject
Geoscience
Signal Processing and Analysis
Semantics
Three-dimensional displays
Cameras
Semantic segmentation
Mars
Image segmentation
Robots
Deep learning
depth prediction
environmental awareness
semantic segmentation
terrain reconstruction
Language
ISSN
0196-2892
1558-0644
Abstract
Martian surface, as a typical unstructured terrain, is extremely challenging for Mars exploration missions. Commonly, Mars rovers require multiple sensors to explore such harsh environments, such as depth cameras, range finder, and other devices. However, the onboard load, power, and storage of rovers are not sufficient to achieve high-level stereoscopic perception, which can be adverse to downstream tasks such as visual navigation and scientific exploration. To this end, in this article, we propose a high-level awareness perception lightweight framework using only close-shot monocular images to implement semantic three-dimensional (3-D) reconstruction of Martian landforms. This framework consists of two parts. One is a semantic segmentation module based on the proposed real-time Mars terrain segmentation (RMTS) network to extract intraclass and interclass contexts by local supervision. The other is a depth generation module based on a dual-encoder pix2pix network to encode the visual and semantic information of monocular images simultaneously. To validate the proposed framework, we construct a Martian planar-stereo dataset based on AI4Mars, an open-source semantic segmentation dataset for Mars surface. It contains monocular close-up Martian images, semantic images, and depth images that match each other. After training, the accuracy of the proposed semantic segmentation model can reach 84.0% mean intersection over union (mIoU), with 152.2 FPS on a single RTX6000-24GB GPU. The absolute relative error of pixels in depth images between the generation model and the ground truth is 0.367, while the root-mean-square error is 0.510, and the accuracy is 0.753 with 42.9 FPS. The overall environment perception scheme is with 9.5 FPS.