학술논문

ParPER: A Partitioned Prioritized Experience Replay in Multi-Agent Setting of Reinforcement Learning
Document Type
Conference
Source
2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT) Smart Systems and Inventive Technology (ICSSIT), 2023 5th International Conference on. :932-938 Jan, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Training
Reinforcement learning
Stability analysis
Multi-Agent Reinforcement Learning (MARL)
Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
Prioritized Experience Replay(PER)
Partitioned PER(ParPER)
Language
ISSN
2832-3017
Abstract
Multi-Agent Reinforcement Learning (MARL) these days are used for modelling numerous real life applications. It is mostly used for modelling such scenarios in which there is very limited information of the environment. Off-policy Reinforcement Learning and Off-policy MARL makes use of buffer to store the experiences or transitions of every agent and later samples it for training purpose. Prioritized Experience Replay (PER) proposed a way to sample transition by the magnitude of the error a transition has during training. The more the error, the more is the probability of that transition of getting sampled for the next training. This method, though effective, encounters a problem. A transition which has low error, might never get sampled, though it is important for the model for that particular transition to be sampled to learn optimal policies. We propose Partitioned PER (ParPER) where we partition the memory into fixed length size and sample equal transitions from them independently. We test our proposed method on Multi-Agent Particle Environment (MPE) and the results were found to be promising.