학술논문

Concentration of contractive stochastic approximation and reinforcement learning.

Document Type

Journal

Author

Chandak, Siddharth (1-STF-E) AMS Author Profile; Borkar, Vivek S. (6-IIT-EE) AMS Author Profile; Dodhia, Parth (6-IIT-EE) AMS Author Profile

Source

Stochastic Systems (Stoch. Syst.) (20220101), 12, no.~4, 411-430. eISSN: 1946-5238.

Subject

90 Operations research, mathematical programming -- 90C Mathematical programming
90C39 Dynamic programming

Language

English

Abstract

Summary: ``Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).''

Online Access

Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송