학술논문

Reinforcement learning without an explicit terminal state
Document Type
Conference
Source
1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227) Neural networks Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on. 3:1998-2003 vol.3 1998
Subject
Computing and Processing
Components, Circuits, Devices and Systems
Signal Processing and Analysis
Learning
Control systems
Cost function
Process control
Dynamic programming
Convergence
Optimal control
Chemical reactors
Electronic mail
Temperature control
Language
ISSN
1098-7576
Abstract
Introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especially occurs in the context of technical process control: the control task is not terminated once a predefined target value is reached, but instead the controller has to continue to control the system in order to avoid the system's output drifting away from its target value again. We propose a set of assumptions and give a proof for the convergence of the value iteration method. From this a new algorithm, which we call the fixed horizon algorithm, is derived. The performance of the proposed algorithm is compared to an approach that assumes the existence of an explicit terminal state. The application to a cart/double pole-system finally shows the application to a difficult practical control task.