학술논문

Behavioral Repertoire via Generative Adversarial Policy Networks
Document Type
Periodical
Source
IEEE Transactions on Cognitive and Developmental Systems IEEE Trans. Cogn. Dev. Syst. Cognitive and Developmental Systems, IEEE Transactions on. 14(4):1344-1355 Dec, 2022
Subject
Computing and Processing
Signal Processing and Analysis
Robots
Task analysis
Generative adversarial networks
Generators
Machine learning
Neural networks
Learning systems
Robot learning
robot learning
Language
ISSN
2379-8920
2379-8939
Abstract
Learning algorithms are enabling robots to solve increasingly challenging real-world tasks. These approaches often rely on demonstrations and reproduce the behavior shown. Unexpected changes in the environment or in robot morphology may require using different behaviors to achieve the same effect, for instance, to reach and grasp an object in changing clutter. An emerging paradigm addressing this robustness issue is to learn a diverse set of successful behaviors for a given task, from which a robot can select the most suitable policy when faced with a new environment. In this article, we explore a novel realization of this vision by learning a generative model over policies. Rather than learning a single policy, or a small fixed repertoire, our generative model for policies compactly encodes an unbounded number of policies and allows novel controller variants to be sampled. Leveraging our generative policy network, a robot can sample novel behaviors until it finds one that works for a new scenario. We demonstrate this idea with an application of robust ball throwing in the presence of obstacles, as well as joint-damage-robust throwing. We show that this approach achieves a greater diversity of behaviors than an existing evolutionary approach, while maintaining good efficacy of sampled behaviors, allowing a Baxter robot to hit targets more often when ball throwing in the presence of varying obstacles or joint impediments.