On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants

Wang, Lingda

On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants

Wang, Lingda

Permalink

https://hdl.handle.net/2142/106342

Description

Title

On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants

Author(s)

Wang, Lingda

Issue Date

2019-11-12

Director of Research (if dissertation) or Advisor (if thesis)

Zhao, Zhizhen

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Multi-Armed Bandits
Non-stationary Environments
Change-Point Detection

Abstract

In recent years, multi-armed bandit (MAB) problems have received much attention, as they model many real-world applications such as online recommendation, web search and crowdsourcing tasks. The core of MAB algorithms is addressing the exploration versus exploitation dilemma, and finding the right balance between them. Several simple yet effective algorithms (e.g., upper confidence bound (UCB) and Thompson sampling (TS)) have been proposed in the literature, which are order optimal compared with the lower bound. Original MAB problems are considered in a stationary environment, where the reward distributions do not evolve over time. Many real-world applications, however, have a non-stationary nature that cannot be fully characterized by the stationary settings. In this thesis, we mainly study the UCB-based algorithms for MAB problems and the variants under the scenario where the reward distributions can change in a piecewise-stationary manner. The variants considered in this thesis are combinatorial MAB (CMAB) and cascading bandit (CB) problems. In the first part of this thesis (Chapters 2, 3 and 4), we propose algorithms, \texttt{GLR-UCB}, \texttt{GLR-CUCB} and \texttt{GLR-CascadeUCB1}, for piecewise-stationary MAB, CMAB and CB problems, respectively. The key idea behind the proposed algorithms is incorporating an almost parameter-free change-point detector, the generalized likelihood ratio (GLR) change-point detector, within the classical \texttt{UCB1} algorithm and its variants (e.g., \texttt{CUCB} and \texttt{CascadeUCB1}). Gap-dependent regret upper bounds of the proposed algorithms are derived and all on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, $L$ is the number of arms in MAB, base arms in CMAB, or items in CB. We also present numerical experiments on both synthetic and real-world datasets to show that our proposed algorithms outperform other state-of-the-art algorithms in the literature. Next, in the second part (Chapter 5), we also derive a nearly matching regret lower bound on the order of $\Omega(\sqrt{NLT})$ for MAB problems, which improves the current best lower bound $\Omega(\sqrt{T})$ by adding the dependence on $L$ and $N$. Since CMAB and CB are variants of MAB, this lower bound also holds for CMAB and CB.

Graduation Semester

2019-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/106342

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants

Wang, Lingda

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In