학술논문

Comparing different methods to speed up reinforcement learning in a complex domain
Document Type
Conference
Source
2005 IEEE International Conference on Systems, Man and Cybernetics System, Man and Cybernetics Systems, Man and Cybernetics, 2005 IEEE International Conference on. 4:3185-3190 Vol. 4 2005
Subject
Robotics and Control Systems
Computing and Processing
Components, Circuits, Devices and Systems
Learning
Robots
Algorithm design and analysis
Convergence
Reinforcement Learning
MDP
Q-Learning
options
SMDP homomorphisms
SMDP
Language
ISSN
1062-922X
Abstract
We introduce a new learning algorithm (semi-DP algorithm) designed for MDPs (Markov decision process) where actions either lead to a deterministic successor state or to the terminal state. The algorithm only needs a finite number of loops to converge exactly to the optimal action-value function. We compare this algorithm and three other methods to speed up or simplify the learning process to ordinary Q-learning in a soccer grid-world. Furthermore, we show that different reward functions can considerably change the convergence time of the learning algorithms even if the optimal policy remains unchanged.