학술논문

A Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle

Document Type

Conference

Author

Rubinstein, Jacob; Ferraro, Francis; Matuszek, Cynthia; Engel, Don

Source

2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR) AIXVR Artificial Intelligence and eXtended and Virtual Reality (AIxVR), 2024 IEEE International Conference on. :281-288 Jan, 2024

Subject

Computing and Processing
Solid modeling
Three-dimensional displays
Virtual reality
Cameras
Object recognition
Artificial intelligence
Robots
Multimodal interaction
Virtual Reality
CLIP
3D Models

Language

ISSN

2771-7453

Abstract

Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송