학술논문

k-Certainty Exploration Method : An Action Selector on Reinforcement Learning to Identify the Environment / k-確実探査法 : 強化学習における環境同定のための行動選択戦略

Document Type

Journal Article

Author

Kazuteru MIYAZAKI; Masayuki YAMAMURA; Shigenobu KOBAYASHI; 宮崎和光; 小林重信; 山村雅幸

Source

人工知能 / Journal of the Japanese Society for Artificial Intelligence. 1995, 10(3):454

Subject

Q-learning
k-certainty
k-certainty exploration method
policy iteration algorithm
reinforcement learning

Language

Japanese

ISSN

2188-2266
2435-8614

Abstract

Reinforcement learning aims to adapt a system to an unkown environment according to rewards. There are two issues to handle delayed reward and uncertainty. Q-learning is a representative reinforcement learning method. It is used by many works since it can learn the optimum policy. However, Q-learning needs numerous trials to converge to the optimum policy. If target environments can be described in a Markov decision process, we can identify them from statistics of sensor-action pairs. When we build the correct environment model, we can derive the optimum policy with policy Iteration Algorithm. Therefore, we can construct the optimum policy through identifying environments efficiently. In this paper, we separate learning process into two phases ; identifying an environment and determining the optimum policy. We propose k-Certainty Exploration Method for identifying an environment. After that, the optimum policy is determined by Policy Iteration Algorithm. We call a rule is k-Certainty if and only if the number of selecting it is larger than k. k-Certainty Explolation Method suppresses any loop of rules that already achieve k-Ceratinty. We show its effect by comparing with Q-learning in two experiments. 0ne is under maze environment of Dyna, the other is the environment where the optimum policy varies according to a parameter.

Online Access

Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송