학술논문

Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems
Document Type
Periodical
Source
IEEE Transactions on Automatic Control IEEE Trans. Automat. Contr. Automatic Control, IEEE Transactions on. 68(5):2922-2933 May, 2023
Subject
Signal Processing and Analysis
Q-learning
Heuristic algorithms
Data models
Convergence
Trajectory
Prediction algorithms
Linear systems
Data-based control
optimal control
reinforcement learning (RL)
Language
ISSN
0018-9286
1558-2523
2334-3303
Abstract
This article introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed offline, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a PE input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.