학술논문

CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Document Type

Conference

Author

Agrawal, Ayush; Arora, Raghav; Datta, Ahana; Banerjee, Snehasis; Bhowmick, Brojeshwar; Jatavallabhula, Krishna Murthy; Sridharan, Mohan; Krishna, Madhava

Source

2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Robot and Human Interactive Communication (RO-MAN), 2023 32nd IEEE International Conference on. :2604-2609 Aug, 2023

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Knowledge engineering
Navigation
Semantics
Knowledge graphs
Organizations
Benchmark testing
Robustness
Commonsense knowledge
graph convolutional network
knowledge graph
large language models
scene rearrangement

Language

ISSN

1944-9437

Abstract

This paper introduces a novel method for determining the best room to place an object in, for embodied scene rearrangement. While state-of-the-art approaches rely on large language models (LLMs) or reinforcement learned (RL) policies for this task, our approach, CLIPGraphs, efficiently combines commonsense domain knowledge, data-driven methods, and recent advances in multimodal learning. Specifically, it (a) encodes a knowledge graph of prior human preferences about the room location of different objects in home environments, (b) incorporates vision-language features to support multimodal queries based on images or text, and (c) uses a graph network to learn object-room affinities based on embeddings of the prior knowledge and the vision-language features. We demonstrate that our approach provides better estimates of the most appropriate location of objects from a benchmark set of object categories in comparison with state-of-the-art baselines. 1 1 Supplementary material and code: https://clipgraphs.github.io

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송