Withdraw
Loading…
On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants
Wang, Lingda
Loading…
Permalink
https://hdl.handle.net/2142/106342
Description
- Title
- On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants
- Author(s)
- Wang, Lingda
- Issue Date
- 2019-11-12
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhao, Zhizhen
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Multi-Armed Bandits
- Non-stationary Environments
- Change-Point Detection
- Abstract
- In recent years, multi-armed bandit (MAB) problems have received much attention, as they model many real-world applications such as online recommendation, web search and crowdsourcing tasks. The core of MAB algorithms is addressing the exploration versus exploitation dilemma, and finding the right balance between them. Several simple yet effective algorithms (e.g., upper confidence bound (UCB) and Thompson sampling (TS)) have been proposed in the literature, which are order optimal compared with the lower bound. Original MAB problems are considered in a stationary environment, where the reward distributions do not evolve over time. Many real-world applications, however, have a non-stationary nature that cannot be fully characterized by the stationary settings. In this thesis, we mainly study the UCB-based algorithms for MAB problems and the variants under the scenario where the reward distributions can change in a piecewise-stationary manner. The variants considered in this thesis are combinatorial MAB (CMAB) and cascading bandit (CB) problems. In the first part of this thesis (Chapters 2, 3 and 4), we propose algorithms, \texttt{GLR-UCB}, \texttt{GLR-CUCB} and \texttt{GLR-CascadeUCB1}, for piecewise-stationary MAB, CMAB and CB problems, respectively. The key idea behind the proposed algorithms is incorporating an almost parameter-free change-point detector, the generalized likelihood ratio (GLR) change-point detector, within the classical \texttt{UCB1} algorithm and its variants (e.g., \texttt{CUCB} and \texttt{CascadeUCB1}). Gap-dependent regret upper bounds of the proposed algorithms are derived and all on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, $L$ is the number of arms in MAB, base arms in CMAB, or items in CB. We also present numerical experiments on both synthetic and real-world datasets to show that our proposed algorithms outperform other state-of-the-art algorithms in the literature. Next, in the second part (Chapter 5), we also derive a nearly matching regret lower bound on the order of $\Omega(\sqrt{NLT})$ for MAB problems, which improves the current best lower bound $\Omega(\sqrt{T})$ by adding the dependence on $L$ and $N$. Since CMAB and CB are variants of MAB, this lower bound also holds for CMAB and CB.
- Graduation Semester
- 2019-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/106342
- Copyright and License Information
- Copyright 2019 Lingda Wang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…