학술논문

TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval

Document Type

Conference

Author

Ruan, Yue; Lee, Han-Hung; Zhang, Yiming; Zhang, Ke; Chang, Angel X.

Source

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) WACV Applications of Computer Vision (WACV), 2024 IEEE/CVF Winter Conference on. :5803-5813 Jan, 2024

Subject

Computing and Processing
Representation learning
Solid modeling
Computer vision
Three-dimensional displays
Systematics
Shape
Buildings
Algorithms
Vision + language and/or other modalities
3D computer vision

Language

ISSN

2642-9381

Abstract

Text-to-shape retrieval is an increasingly relevant problem with the growth of 3D shape data. Recent work on contrastive losses for learning joint embeddings over multimodal data [45] has been successful at tasks such as retrieval and classification. Thus far, work on joint representation learning for 3D shapes and text has focused on improving embeddings through modeling of complex attention between representations [53], or multi-task learning [25]. We propose a trimodal learning scheme over text, multi-view images and 3D shape voxels, and show that with large batch contrastive learning we achieve good performance on text-to-shape retrieval without complex attention mechanisms or losses. Our experiments serve as a foundation for follow-up work on building trimodal embeddings for text-image-shape.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송