학술논문

Predictive PER: Balancing Priority and Diversity Towards Stable Deep Reinforcement Learning

Document Type

Conference

Author

Lee, Sanghwa; Lee, Jaeyoung; Hasuo, Ichiro

Source

2021 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2021 International Joint Conference on. :1-10 Jul, 2021

Subject

Bioengineering
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Training
Neural networks
Buildings
Games
Reinforcement learning
Explosions
prioritized experience replay
deep Q-learning
forgetting
stability
reinforcement learning
DQN
Atari games

Language

ISSN

2161-4407

Abstract

Prioritized experience replay (PER) samples important transitions, rather than uniformly, to improve data efficiency of a deep reinforcement learning agent. We claim that such prioritization must be balanced with sample diversity to make the deep Q-network (DQN) stabilized and prevent severe forgetting. Our proposed improvement over PER, called Predictive PER (PPER), takes three countermeasures (TDInit, TDClip, TDPred) for (i) eliminating priority outliers and explosions; (ii) improving the diversity of samples and their distributions, weighted by priorities. Both contribute to stabilizing the learning process, thus forgetting less. The most notable among the three is TDPred, the second DNN introduced for generalizing in-distribution priorities. Ablation and experimental studies with Atari games show that each countermeasure, in its own way, and PPER successfully contribute to enhancing stability hence performance, over PER.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송