학술논문

A Comparison of Self-Play Algorithms Under a Generalized Framework

Document Type

Periodical

Author

Hernandez, D.; Denamganai, K.; Devlin, S.; Samothrakis, S.; Walker, J.A.

Source

IEEE Transactions on Games IEEE Trans. Games Games, IEEE Transactions on. 14(2):221-231 Jun, 2022

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Training
Games
Measurement
Statistics
Sociology
Heuristic algorithms
Reinforcement learning
Emergent phenomena
machine learning
multi-agent systems

Language

ISSN

2475-1502
2475-1510

Abstract

The notion of self-play, albeit often cited in multiagent reinforcement learning as a process by which to train agent policies from scratch, has received little efforts to be taxonomized within a formal model. We present a formalized framework, with clearly defined assumptions, which encapsulates the meaning of self-play as abstracted from various existing self-play algorithms. This framework is framed as an approximation to a theoretical solution concept for multiagent training. Through a novel qualitative visualization metric, on a simple environment, we show that different self-play algorithms generate different distributions of episode trajectories, leading to different explorations of the policy space by the learning agents. Quantitatively, on two environments, we analyze the learning dynamics of policies trained under different self-play algorithms captured under our framework and perform cross self-play performance comparisons. Our results indicate that, throughout training, various widely used self-play algorithms exhibit cyclic policy evolutions and that the choice of self-play algorithm significantly affects the final performance of trained agents.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송