학술논문

k-Certainty Exploration Method : An Action Selector on Reinforcement Learning to Identify the Environment / k-確実探査法 : 強化学習における環境同定のための行動選択戦略
Document Type
Journal Article
Source
人工知能 / Journal of the Japanese Society for Artificial Intelligence. 1995, 10(3):454
Subject
Q-learning
k-certainty
k-certainty exploration method
policy iteration algorithm
reinforcement learning
Language
Japanese
ISSN
2188-2266
2435-8614
Abstract
Reinforcement learning aims to adapt a system to an unkown environment according to rewards. There are two issues to handle delayed reward and uncertainty. Q-learning is a representative reinforcement learning method. It is used by many works since it can learn the optimum policy. However, Q-learning needs numerous trials to converge to the optimum policy. If target environments can be described in a Markov decision process, we can identify them from statistics of sensor-action pairs. When we build the correct environment model, we can derive the optimum policy with policy Iteration Algorithm. Therefore, we can construct the optimum policy through identifying environments efficiently. In this paper, we separate learning process into two phases ; identifying an environment and determining the optimum policy. We propose k-Certainty Exploration Method for identifying an environment. After that, the optimum policy is determined by Policy Iteration Algorithm. We call a rule is k-Certainty if and only if the number of selecting it is larger than k. k-Certainty Explolation Method suppresses any loop of rules that already achieve k-Ceratinty. We show its effect by comparing with Q-learning in two experiments. 0ne is under maze environment of Dyna, the other is the environment where the optimum policy varies according to a parameter.

Online Access