학술논문

Comparing different methods to speed up reinforcement learning in a complex domain

Document Type

Conference

Author

Source

2005 IEEE International Conference on Systems, Man and Cybernetics System, Man and Cybernetics Systems, Man and Cybernetics, 2005 IEEE International Conference on. 4:3185-3190 Vol. 4 2005

Subject

Robotics and Control Systems
Computing and Processing
Components, Circuits, Devices and Systems
Learning
Robots
Algorithm design and analysis
Convergence
Reinforcement Learning
MDP
Q-Learning
options
SMDP homomorphisms
SMDP

Language

ISSN

1062-922X

Abstract

We introduce a new learning algorithm (semi-DP algorithm) designed for MDPs (Markov decision process) where actions either lead to a deterministic successor state or to the terminal state. The algorithm only needs a finite number of loops to converge exactly to the optimal action-value function. We compare this algorithm and three other methods to speed up or simplify the learning process to ordinary Q-learning in a soccer grid-world. Furthermore, we show that different reward functions can considerably change the convergence time of the learning algorithms even if the optimal policy remains unchanged.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송