학술논문

Learning Optimal Parameterized Policy for High Level Strategies in a Game Setting

Document Type

Conference

Author

Prakash, Ravi; Vohra, Mohit; Behera, Laxmidhar

Source

2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Robot and Human Interactive Communication (RO-MAN), 2019 28th IEEE International Conference on. :1-6 Oct, 2019

Subject

Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Transportation

Language

ISSN

1944-9437

Abstract

Complex and interactive robot manipulation skills such as playing a game of table tennis against a human opponent is a multifaceted challenge and a novel problem. Accurate dynamic trajectory generation in such dynamic situations and an appropriate controller in order to respond to the incoming table tennis ball from the opponent is only a prerequisite to win the game. Decision making is a major part of an intelligent robot and a policy is needed to choose and execute the action which receives highest reward. In this paper, we address this very important problem on how to learn the higher level optimal strategies that enable competitive behaviour with humans in such an interactive game setting. This paper presents a novel technique to learn a higher level strategy for the game of table tennis using P-Q Learning (a mixture of Pavlovian learning and Q-learning) to learn a parameterized policy. The cooperative learning framework of Kohenon Self Organizing Map (KSOM) along with Replay Memory is employed for faster strategy learning in this short horizon problem. The strategy is learnt in simulation, using a simulated human opponent and an ideal robot that can perform hitting motion in its workspace accurately. We show that our method is able to improve the average received reward significantly in comparison to the other state-of-the-art methods.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송