학술논문

Acting with Language [electronic resource]

Document Type

Theses

Author

Shridhar, Mohit

Source

Dissertations Abstracts International; Dissertation Abstract International; 85-01B.

Subject

Robotics
Computer Vision
Grounding
Manipulation
Natural language Processing
Robot learning

Language

English

Abstract

Summary: How can we imbue robots with the ability to achieve arbitrary goals in novel environments? Language provides a natural interface for guiding robots and abstracting the complexities of the physical world. Previous attempts to guide robots with language often rely on human-designed intermediate representations, such as object detections, categories, poses, and symbolic states. These representations struggle to represent everyday objects, such as deformable shirts, coffee beans, ropes, and cherry stems.One solution that does not require human-designed representations is end-to-end deep learning, which directly maps camera observations to robot actions. While learning approaches are vastly more expressive than traditional methods, they are severely bottlenecked by the lack of training data in robotics. Training a simple policy could take months of data collection and is not scalable. However, robot data includes spatial symmetries and other structural priors that can be utilized to efficiently learn policies for a wide range of tasks.In this thesis, we present various methods for using language to guide robot actions through end-to-end learning. First, we present ALFRED, a large-scale dataset and benchmark for evaluating agents that follow language instructions in partially-observable household environments. Next, we introduce CLIPort and PerAct, two language-conditioned manipulation frameworks that aim to replicate the success of pre-training large models from vision and language in robotics. These frameworks use spatial priors to efficiently learn action representations from limited data. Lastly, we discuss ALFWorld, a framework for learning "textual policies" in interactive text games, thereby avoiding the visual and physical complexities of interacting with embodied environments. We conclude with a discussion on counterpoints, limitations, and potential future directions for scaling-up robot-learning and butler robots.

Online Access

Full Text (KERIS DDOD) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송