학술논문

Empowering the Diversity and Individuality of Option: Residual Soft Option Critic Framework
Document Type
Periodical
Source
IEEE Transactions on Neural Networks and Learning Systems IEEE Trans. Neural Netw. Learning Syst. Neural Networks and Learning Systems, IEEE Transactions on. 34(8):4816-4825 Aug, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
General Topics for Engineers
Entropy
Task analysis
Reinforcement learning
Mutual information
Diversity reception
Convergence
Games
Deep reinforcement learning (RL)
diversity and individuality
hierarchical RL (HRL)
option critic
residual
Language
ISSN
2162-237X
2162-2388
Abstract
Extracting temporal abstraction (option), which empowers the action space, is a crucial challenge in hierarchical reinforcement learning. Under a well-structured action space, decision-making agents can probe more deeply in the searching or plan efficiently through pruning irrelevant action candidates. However, automatically capturing a well-performed temporal abstraction is a nontrivial challenge due to its insufficient exploration and inadequate functionality. We consider alleviating this challenge from two perspectives, i.e., diversity and individuality. For the aspect of diversity, we propose a maximum entropy model based on ensembled options to encourage exploration. For the aspect of individuality, we propose to distinguish each option accurately, utilizing mutual formation minimization, so that each option can better express and function. We name our framework as an ensemble with soft option (ESO) critics. Furthermore, the residual algorithm (RA) with a bidirectional target network is introduced to stabilize bootstrapping, yielding a residual version of ESO. We provide detailed analysis for extensive experiments, which shows that our method boosts performance in commonly used continuous control tasks.