Parametrized Stochastic Multi-armed Bandits with Binary Rewards
Jiang, Chong
Loading…
Permalink
https://hdl.handle.net/2142/18352
Description
Title
Parametrized Stochastic Multi-armed Bandits with Binary Rewards
Author(s)
Jiang, Chong
Issue Date
2011-01-14T22:47:13Z
Director of Research (if dissertation) or Advisor (if thesis)
Srikant, Rayadurgam
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
multi-armed bandits
machine learning
Abstract
In this thesis, we consider the problem of multi-armed bandits with a
large number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across arms
and across time, where the probabilities of success are parametrized by known
attribute vectors for each arm, as well as an unknown preference vector.
For this model, we seek an algorithm with a total regret that
is sub-linear in time and independent of the number of arms. We present
such an algorithm, which we call the Three-phase Algorithm, and analyze
its performance. We show an upper bound on the total regret which applies uniformly in time.
The asymptotics of this bound show that for any $f \in \omega(\log(T))$, the total
regret can be made to be $O(f(T))$, independent of the number of arms.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.