학술논문

Offline–Online Actor–Critic

Document Type

Periodical

Author

Source

IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 5(1):61-69 Jan, 2024

Subject

Computing and Processing
Training
Degradation
Reinforcement learning
Cloning
Algorithm design and analysis
Actor–critic
behavior clone (BC) constraint
distribution shift
offline–online reinforcement learning (RL)
policy performance degradation

Language

ISSN

2691-4581

Abstract

Offline–online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy may degrade when an agent moves from offline to online training phases. In this article, we first analyze the problems of distribution shift and policy performance degradation in offline–online RL. Then, in order to alleviate these problems, we propose a novel RL algorithm offline–online actor–critic (O2AC) algorithm. In O2AC, a behavior clone constraint term is introduced into the policy objective function to address the distribution shift in offline training phase. In addition, in online training phase, the influence of the behavior clone constraint term is gradually reduced, which alleviates the policy performance degradation. Experiments show that O2AC outperforms existing offline–online RL algorithms.

Online Access

Full Text (IEEE) Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송