학술논문

AuPPLE: Augmented Physical Priors through Language Enhancement using Self-Supervised Learning
Document Type
Conference
Source
2023 14th International Conference on Information and Communication Technology Convergence (ICTC) Information and Communication Technology Convergence (ICTC), 2023 14th International Conference on. :961-966 Oct, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fields, Waves and Electromagnetics
Power, Energy and Industry Applications
Signal Processing and Analysis
Transportation
Training
Grounding
Training data
Predictive models
Benchmark testing
Physics
Context modeling
physics
large language models
grounding
intuition
artificial intelligence
Language
ISSN
2162-1241
Abstract
In recent years, a contentious debate has emerged surrounding the degree to which Large Language Models (LLMs) can truly achieve grounding in the physical world. Grounding, in this context, refers to the models’ ability to establish a meaningful connection between their language-based understanding and a concrete comprehension of real-world phenomena. Our research aims to explore the latent capability of LLMs to develop physical intuition: a prerequisite for embodied agents to effectively perform tasks in real-world environments. In this paper, we release a novel dataset of physical scenarios that serve as a benchmark for an LLMs’ physical intuition. Our benchmark AuPPLE (Augmented Physical Priors through Language Enhancement) for Language Models includes scenarios regarding free-fall and projectile motion, including various question-answer formulations: MultiQA, binary classification, and continuous number prediction to comprehend linguistic nuances and apply their understanding within a physical context. By meticulously fine-tuning LLMs on this specialized dataset, we assess their performance in providing responses that showcase an ability to draw upon underlying physical knowledge. With our fine-tuned LLMs achieving over 87%—more than 3 times its base model—on free-fall evaluation dataset, our results shed light on the intrinsic grounding capabilities of LLMs, offering insights into their potential to bridge the gap between language and the physical world. This paper contributes to the ongoing discourse on the true nature of LLMs’ comprehension and its relationship with real-world context, underscoring the strides made in enhancing their intuitive understanding through targeted fine-tuning techniques.