학술논문

$\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

Document Type

Working Paper

Author

Zhang, Yufeng; Chen, Liyu; Liu, Boyi; Yang, Yingxiang; Cui, Qiwen; Tao, Yunzhe; Yang, Hongxia

Source

Subject

Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Computer Science - Computation and Language

Language

Abstract

Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale. Yet, there is a noticeable absence of a cost-effective and standardized testbed tailored to evaluating and comparing these algorithms. To bridge this gap, we present a generalized version of the 24-Puzzle: the $(N,K)$-Puzzle, which challenges language models to reach a target value $K$ with $N$ integers. We evaluate the effectiveness of established RL algorithms such as Proximal Policy Optimization (PPO), alongside novel approaches like Identity Policy Optimization (IPO) and Direct Policy Optimization (DPO).
Comment: 8 pages

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송