학술논문

Trajectory Planning With Deep Reinforcement Learning in High-Level Action Spaces

Document Type

Periodical

Author

Williams, K.R.; Schlossman, R.; Whitten, D.; Ingram, J.; Musuvathy, S.; Pagan, J.; Williams, K.A.; Green, S.; Patel, A.; Mazumdar, A.; Parish, J.

Source

IEEE Transactions on Aerospace and Electronic Systems IEEE Trans. Aerosp. Electron. Syst. Aerospace and Electronic Systems, IEEE Transactions on. 59(3):2513-2529 Jun, 2023

Subject

Aerospace
Robotics and Control Systems
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Trajectory
Planning
Trajectory planning
Training
Reinforcement learning
Optimization
Aerodynamics

Language

ISSN

0018-9251
1557-9603
2371-9877

Abstract

This article presents a technique for trajectory planning based on parameterized high-level actions. These high-level actions are subtrajectories that have variable shape and duration. The use of high-level actions can improve the performance of guidance algorithms. Specifically, we show how the use of high-level actions improves the performance of guidance policies that are generated via reinforcement learning (RL). RL has shown great promise for solving complex control, guidance, and coordination problems but can still suffer from long training times and poor performance. This work shows how the use of high-level actions reduces the required number of training steps and increases the path performance of an RL-trained guidance policy. We demonstrate the method on a space-shuttle guidance example. We show the proposed method increases the path performance (latitude range) by 18% compared with a baseline RL implementation. Similarly, we show the proposed method achieves steady state during training with approximately 75% fewer training steps. We also show how the guidance policy enables effective performance in an obstacle field. Finally, this article develops a loss function term for policy-gradient-based deep RL, which is analogous to an antiwindup mechanism in feedback control. We demonstrate that the inclusion of this term in the underlying optimization increases the average policy return in our numerical example.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송