학술논문

Panoptic Vision-Language Feature Fields
Document Type
Periodical
Source
IEEE Robotics and Automation Letters IEEE Robot. Autom. Lett. Robotics and Automation Letters, IEEE. 9(3):2144-2151 Mar, 2024
Subject
Robotics and Control Systems
Computing and Processing
Components, Circuits, Devices and Systems
Semantics
Three-dimensional displays
Semantic segmentation
Self-supervised learning
Instance segmentation
Image reconstruction
Computational modeling
Semantic scene understanding
deep learning for visual perception
3D open vocabulary panoptic segmentation
neural implicit representation
Language
ISSN
2377-3766
2377-3774
Abstract
Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.