학술논문

Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

Document Type

Conference

Author

Banerjee, Chayan; Chen, Zhiyong; Noman, Nasimul

Source

2023 62nd IEEE Conference on Decision and Control (CDC) Decision and Control (CDC), 2023 62nd IEEE Conference on. :7009-7014 Dec, 2023

Subject

Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Training
Reinforcement learning
Boosting
Stability analysis
Task analysis
Optimization

Language

ISSN

2576-2370

Abstract

Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송