학술논문

Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States
Document Type
Conference
Source
2023 62nd IEEE Conference on Decision and Control (CDC) Decision and Control (CDC), 2023 62nd IEEE Conference on. :7009-7014 Dec, 2023
Subject
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Training
Reinforcement learning
Boosting
Stability analysis
Task analysis
Optimization
Language
ISSN
2576-2370
Abstract
Improvement of exploration and exploitation using more efficient samples is a critical issue in reinforcement learning algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminate exploration of the entire environment state space, as well as to encourage exploration of rarely visited states rather than frequently visited ones. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on the measurement of a state's novelty and the associated benefit of exploring the state, collectively called plausible novelty. By incentivizing exploration of plausible novel states, an actor-critic (AC) algorithm can improve its sample efficiency and, consequently, its training performance. The new method is verified through extensive simulations of continuous control tasks in MuJoCo environments, using a variety of prominent off-policy AC algorithms.