학술논문

The K-armed dueling bandits problem

Document Type

Article

Author

Yue, Yisong; Broder, Josef; Kleinberg, Robert; Joachims, Thorsten

Source

Journal of Computer & System Sciences. Sep2012, Vol. 78 Issue 5, p1538-1556. 19p.

Subject

*INFORMATION theory
*COMPUTER algorithms
*DISTANCE education
*LEARNING problems
*INFORMATION retrieval
*MEASURE theory

Language

ISSN

0022-0000

Abstract

Abstract: We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor). [Copyright &y& Elsevier]

Online Access

Full Text (ScienceDirect 종량제) Full Text (ScienceDirect O/A) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송