학술논문

Receding Horizon Actor–Critic Learning Control for Nonlinear Time-Delay Systems With Unknown Dynamics
Document Type
Periodical
Source
IEEE Transactions on Systems, Man, and Cybernetics: Systems IEEE Trans. Syst. Man Cybern, Syst. Systems, Man, and Cybernetics: Systems, IEEE Transactions on. 53(8):4980-4993 Aug, 2023
Subject
Signal Processing and Analysis
Robotics and Control Systems
Power, Energy and Industry Applications
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Delay effects
Optimal control
Control systems
Stability criteria
Simulation
Predictive control
Costs
Discrete-time nonlinear systems
Koopman operator
receding horizon control
reinforcement learning (RL)
time-delay systems
Language
ISSN
2168-2216
2168-2232
Abstract
With the development of modern mechatronics and networked systems, the controller design of time-delay systems has received notable attention. Time delays can greatly influence the stability and performance of the systems, especially for optimal control design. In this article, we propose a receding horizon actor–critic learning control approach for near-optimal control of nonlinear time-delay systems (RACL-TD) with unknown dynamics. In the proposed approach, a data-driven predictor for nonlinear time-delay systems is first learned based on the Koopman theory using precollected samples. Then, a receding horizon actor–critic architecture is designed to learn a near-optimal control policy. In RACL-TD, the terminal cost is determined by using the Lyapunov–Krasovskii approach so that the influences of the delayed states and control inputs can be well addressed. Furthermore, a relaxed terminal condition is present to reduce the computational cost. The convergence and optimality of RACL-TD in each prediction interval as well as the closed-loop property of the system are discussed and analyzed. Simulation results on a two-stage time-delayed chemical reactor illustrate that RACL-TD can achieve better control performance than nonlinear model predictive control (MPC) and infinite-horizon adaptive dynamic programming. Moreover, RACL-TD can have less computational cost than nonlinear MPC.