학술논문

ActiveZero++: Mixed Domain Learning Stereo and Confidence-Based Depth Completion With Zero Annotation
Document Type
Periodical
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 45(12):14098-14113 Dec, 2023
Subject
Computing and Processing
Bioengineering
Cameras
Stereo vision
Lighting
Training
Costs
Annotations
Shape
Depth completion
learning-based stereo
sensors
Sim2Real
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
Learning-based stereo methods usually require a large scale dataset with depth, however obtaining accurate depth in the real domain is difficult, but groundtruth depth is readily available in the simulation domain. In this article we propose a new framework, ActiveZero++, which is a mixed domain learning solution for active stereovision systems that requires no real world depth annotation. In the simulation domain, we use a combination of supervised disparity loss and self-supervised loss on a shape primitives dataset. By contrast, in the real domain, we only use self-supervised loss on a dataset that is out-of-distribution from either training simulation data or test real data. To improve the robustness and accuracy of our reprojection loss in hard-to-perceive regions, our method introduces a novel self-supervised loss called temporal IR reprojection. Further, we propose the confidence-based depth completion module, which uses the confidence from the stereo network to identify and improve erroneous areas in depth prediction through depth-normal consistency. Extensive qualitative and quantitative evaluations on real-world data demonstrate state-of-the-art results that can even outperform a commercial depth sensor. Furthermore, our method can significantly narrow the Sim2Real domain gap of depth maps for state-of-the-art learning based 6D pose estimation algorithms.