학술논문

Sampling-based Inverse Reinforcement Learning Algorithms with Safety Constraints
Document Type
Conference
Source
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Intelligent Robots and Systems (IROS), 2021 IEEE/RSJ International Conference on. :791-798 Sep, 2021
Subject
Robotics and Control Systems
Road transportation
Monte Carlo methods
Reinforcement learning
Traffic control
Markov processes
Entropy
Safety
Reinforcement Learning
Inverse Reinforcement Learning
Maximum Entropy
Constraints
SUMO
Language
ISSN
2153-0866
Abstract
Planning for robotic systems is frequently formulated as an optimization problem. Instead of manually tweaking the parameters of the cost function, they can be learned from human demonstrations by Inverse Reinforcement Learning (IRL). Common IRL approaches employ a maximum entropy trajectory distribution that can be learned with soft reinforcement learning, where the reward maximization is regularized with an entropy objective. The consideration of safety constraints is of paramount importance for human-robot collaboration. For this reason, our work addresses maximum entropy IRL in constrained environments. Our contribution to this research area is threefold: (1) We propose Constrained Soft Reinforcement Learning (CSRL), an extension of soft reinforcement learning to Constrained Markov Decision Processes (CMDPs). (2) We transfer maximum entropy IRL to CMDPs based on CSRL. (3) We show that using importance sampling in maximum entropy IRL in constrained environments introduces a bias and fails to achieve feature matching. In our evaluation we consider the tactical lane change decision of an autonomous vehicle in a highway scenario modeled in the SUMO traffic simulation.