학술논문

Interpreting Tangled Program Graphs Under Partially Observable Dota 2 Invoker Tasks
Document Type
Periodical
Source
IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 5(4):1511-1524 Apr, 2024
Subject
Computing and Processing
Task analysis
Artificial intelligence
Reinforcement learning
Registers
Genetic programming
Complexity theory
Visualization
Emergence
evolutionary computation
genetic programming (GP)
interpretable machine learning
Language
ISSN
2691-4581
Abstract
Interpretable learning agents directly construct models that provide insight into the relationships learnt. Moreover, to date, there has been a lot of emphasis on interpreting reactive models developed for supervised learning tasks. In this article, we consider the case of models developed to address a suite of six partially observable tasks defined in the Dota 2 Online Battle Arena game engine. This means that learning agents need to make decisions based on the previous state as developed by the learning agent's memory, in addition to a 310-D state vector provided by the game engine. Interpretability is addressed by adopting the tangled program graph approach to developing learning agents. Thus, decision making is explicitly divide-and-conquer, with different parts of the resulting graph visited depending on the task context. We demonstrate that programs comprising the tangled program graph approach self-organize such that: 1) small subsets of task features are identified to define conditions under which index memory is written and 2) the subset of programs responsible for defining actions typically query indexed memory rather than task features. Particular preferences emerge for different tasks; thus, the blocking (or evasion) tasks result in a preference for specific actions, whereas more open-ended tasks assume policies based on combinations of behaviors. In short, the ability to evolve the topology of the learning agent provides insights into how the policies are being constructed for addressing partially observable tasks.