학술논문

Speeding-Up Action Learning in a Social Robot With Dyna-Q+: A Bioinspired Probabilistic Model Approach

Document Type

Periodical

Author

Maroto-Gomez, M.; Gonzalez, R.; Castro-Gonzalez, A.; Malfaz, M.; Salichs, M.A.

Source

IEEE Access Access, IEEE. 9:98381-98397 2021

Subject

Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Robots
Reinforcement learning
Task analysis
Collision avoidance
Navigation
Probabilistic logic
Stability analysis
Action learning
decision-making
human–robot interaction
probabilistic model
reinforcement learning
social robots

Language

ISSN

2169-3536

Abstract

Robotic systems that are developed for social and dynamic environments require adaptive mechanisms to successfully operate. Consequently, learning from rewards has provided meaningful results in applications involving human-robot interaction. In those cases where the robot’s state space and the number of actions is extensive, dimensionality becomes intractable and this drastically slows down the learning process. This effect is specially notorious in one-step temporal difference methods because just one update is performed per robot-environment interaction. In this paper, we prove how the action-based learning of a social robot can be improved by combining classical temporal difference reinforcement learning methods, such as Q-learning or Q( $\lambda $ ), with a probabilistic model of the environment. This architecture, which we have called Dyna, allows the robot to simultaneously act and plan using the experience obtained during real human-robot interactions. Principally, Dyna improves classical algorithms in terms of convergence speed and stability, which strengthens the learning process. Hence, in this work we have embedded a Dyna architecture in our social robot, Mini, to endow it with the ability to autonomously maintain an optimal internal state while living in a dynamic environment.

Online Access

Open Access (EBSCO) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송