학술논문

Concentration of contractive stochastic approximation and reinforcement learning.
Document Type
Journal
Author
Chandak, Siddharth (1-STF-E) AMS Author Profile; Borkar, Vivek S. (6-IIT-EE) AMS Author Profile; Dodhia, Parth (6-IIT-EE) AMS Author Profile
Source
Stochastic Systems (Stoch. Syst.) (20220101), 12, no.~4, 411-430. eISSN: 1946-5238.
Subject
90 Operations research, mathematical programming -- 90C Mathematical programming
  90C39 Dynamic programming
Language
English
Abstract
Summary: ``Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).''