학술논문

Adjacency Constraint for Efficient Hierarchical Reinforcement Learning
Document Type
Periodical
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 45(4):4152-4166 Apr, 2023
Subject
Computing and Processing
Bioengineering
Task analysis
Reinforcement learning
Training
Random variables
Postal services
Markov processes
Games
Hierarchical reinforcement learning (HRL)
reinforcement learning (RL)
goal-conditioning
subgoal generation
adjacency constraint
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this article, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$k-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.