학술논문

RL-Driven MPPI: Accelerating Online Control Laws Calculation With Offline Policy
Document Type
Periodical
Source
IEEE Transactions on Intelligent Vehicles IEEE Trans. Intell. Veh. Intelligent Vehicles, IEEE Transactions on. 9(2):3605-3616 Feb, 2024
Subject
Transportation
Robotics and Control Systems
Components, Circuits, Devices and Systems
Costs
Optimal control
Vehicle dynamics
Trajectory
Complex systems
Task analysis
Real-time systems
Model predictive control (MPC)
reinforcement learning (RL)
unmanned aerial vehicle (UAV)
Language
ISSN
2379-8858
2379-8904
Abstract
Model Predictive Path Integral (MPPI) is a recognized sampling-based approach for finite horizon optimal control problems. However, the efficacy and computational efficiency of prevailing MPPI methods are heavily reliant on the quality of rollouts. This is problematic because it is hard to sample a low-cost trajectory using random control sequences, thereby leading to inferior performance and computational efficiency, especially under constrained resources. To address this issue, we propose a data-efficient MPPI method called reinforcement learning-driven MPPI (RL-driven MPPI), which significantly reduces the dependency on the quantity and quality of samples. RL-driven MPPI employs an offline-online policy learning scheme, where the offline policy learned by RL serves as the initial solution and the initial rollout generator of MPPI, effectively combining the strengths of both RL and MPPI. The rollouts generated by RL typically correspond to a lower cost-to-go compared to random sampling, which significantly boosts the sample efficiency and convergence speed of MPPI. Moreover, the value function learned by RL offers an accurate estimation for infinite-horizon cost-to-go, enabling it to serve as a terminal term for the cost criteria of MPPI. This approach empowers MPPI to approximate an infinite-horizon cost with a shorter prediction horizon, thus enhancing real-time performance at each time step. An unmanned aerial vehicle control task is conducted to evaluate the proposed method. Results indicate that the proposed RL-driven MPPI method exhibits superior control performance and sample efficiency.