학술논문

Inferring Non-Stationary Human Preferences for Human-Agent Teams
Document Type
Conference
Source
2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Robot and Human Interactive Communication (RO-MAN), 2020 29th IEEE International Conference on. :1178-1185 Aug, 2020
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Conferences
Decision making
Reinforcement learning
Markov processes
Task analysis
Robots
Language
ISSN
1944-9437
Abstract
One main challenge to robot decision making in human-robot teams involves predicting the intents of a human team member through observations of the human’s behavior. Inverse Reinforcement Learning (IRL) is one approach to predicting human intent, however, such approaches typically assume that the human’s intent is stationary. Furthermore, there are few approaches that identify when the human’s intent changes during observations. Modeling human decision making as a Markov decision process, we address these two limitations by maintaining a belief over the reward parameters of the model (representing the human’s preference for tasks or goals), and updating the parameters using IRL estimates from short windows of observations. We posit that a human’s preferences can change with time, due to gradual drift of preference and/or discrete, step-wise changes of intent. Our approach maintains an estimate of the human’s preferences under such conditions, and is able to identify changes of intent based on the divergence between subsequent belief updates. We demonstrate that our approach can effectively track dynamic reward parameters and identify changes of intent in a simulated environment, and that this approach can be leveraged by a robot team member to improve team performance.