학술논문
Concentration of contractive stochastic approximation and reinforcement learning.
Document Type
Journal
Author
Chandak, Siddharth (1-STF-E) AMS Author Profile; Borkar, Vivek S. (6-IIT-EE) AMS Author Profile; Dodhia, Parth (6-IIT-EE) AMS Author Profile
Source
Subject
90 Operations research, mathematical programming -- 90C Mathematical programming
90C39Dynamic programming
90C39
Language
English
Abstract
Summary: ``Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).''