학술논문
Inferring Non-Stationary Human Preferences for Human-Agent Teams
Document Type
Conference
Author
Source
2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Robot and Human Interactive Communication (RO-MAN), 2020 29th IEEE International Conference on. :1178-1185 Aug, 2020
Subject
Language
ISSN
1944-9437
Abstract
One main challenge to robot decision making in human-robot teams involves predicting the intents of a human team member through observations of the human’s behavior. Inverse Reinforcement Learning (IRL) is one approach to predicting human intent, however, such approaches typically assume that the human’s intent is stationary. Furthermore, there are few approaches that identify when the human’s intent changes during observations. Modeling human decision making as a Markov decision process, we address these two limitations by maintaining a belief over the reward parameters of the model (representing the human’s preference for tasks or goals), and updating the parameters using IRL estimates from short windows of observations. We posit that a human’s preferences can change with time, due to gradual drift of preference and/or discrete, step-wise changes of intent. Our approach maintains an estimate of the human’s preferences under such conditions, and is able to identify changes of intent based on the divergence between subsequent belief updates. We demonstrate that our approach can effectively track dynamic reward parameters and identify changes of intent in a simulated environment, and that this approach can be leveraged by a robot team member to improve team performance.