학술논문

Learning Visual-Audio Representations for Voice-Controlled Robots

Document Type

Conference

Author

Chang, Peixin; Liu, Shuijing; McPherson, D. Livingston; Driggs-Campbell, Katherine

Source

2023 IEEE International Conference on Robotics and Automation (ICRA) Robotics and Automation (ICRA), 2023 IEEE International Conference on. :9508-9514 May, 2023

Subject

Robotics and Control Systems
Representation learning
Reactive power
Automation
Pipelines
Reinforcement learning
Robot sensing systems
Task analysis

Language

Abstract

Based on the recent advancements in representation learning, we propose a novel pipeline for task-oriented voice-controlled robots with raw sensor inputs. Previous methods rely on a large number of labels and task-specific reward functions. Not only can such an approach hardly be improved after the deployment, but also has limited generalization across robotic platforms and tasks. To address these problems, our pipeline first learns a visual-audio representation (VAR) that associates images and sound commands. Then the robot learns to fulfill the sound command via reinforcement learning using the reward generated by the VAR. We demonstrate our approach with various sound types, robots, and tasks. We show that our method outperforms previous work with much fewer labels. We show in both the simulated and real-world experiments that the system can self-improve in previously unseen scenarios given a reasonable number of newly labeled data.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송