Interactive learning in nonstationary environments

Zhou, Huozhi

Interactive learning in nonstationary environments

Zhou, Huozhi

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/117635

Description

Title

Interactive learning in nonstationary environments

Author(s)

Zhou, Huozhi

Issue Date

2022-10-14

Director of Research (if dissertation) or Advisor (if thesis)

Varshney, Lav R.

Doctoral Committee Chair(s)

Varshney, Lav R.

Committee Member(s)

Veeravalli, Venugopal V.
Srikant, Rayadurgam
Milenkovic, Olgica

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Interactive Learning
Nonstationarity
Online Learning

Abstract

This dissertation focuses on understanding optimal and principled interaction strategies to make decisions under nonstationary or non independent and identically distributed (i.i.d.) environments through online feedback. Previous works typically assume the environments are stationary or fully adversarial, but these assumptions are too extreme for certain practical problems of interest. For example, in network routing the paths’ conditions are more likely to be slowly time-varying, rather than being stationary or fully adversarial. We aim to fill the gap for the existing literature by studying how to strategically interact with the slowly time-varying environments to achieve the optimal statistical rates. In particular, we apply several techniques including change point detection, adaptive restart, and distribution mismatch regularization to handle the increasing difficulty of balancing the exploration-exploitation tradeoff. These techniques are simple but effective, in the sense they almost match the information-theoretic performance lower bounds we established via change of measure method. The techniques introduced in this dissertation have strong theoretical guarantees that can deepen our understanding on how to design decision systems for nonstationary environments. The first problem considered is piecewise-stationary combinatorial semi- bandits, where each arm has a combinatorial structure that aggregates the rewards of related base arms and the rewards change abruptly at unknown time steps. It has a wide range of applications, including online recommendation, network routing, and ranking. We show that adding an efficient change point detector on top of the combination of uniform exploration and optimistic principle suffices to achieve the minimax dynamic regret bound up to polylogarithmic factors, which yields our proposed algorithm GLR-CUCB. Numerical experiments demonstrate the effectiveness of GLR-CUCB. Next we study a slightly harder instance of bandit problems, the piecewise-stationary cascading bandit, where a learner aims to learn the K most attractive items out of a ground set of size L through online feedback. The key difference from piecewise-stationary combinatorial semi-bandit is that the learner only has access to censored feedback of the pulled arm. More specifically, given the pulled arm (an ordered list of L items), the learner only receives feedback of the top K items in this ordered list. Surprisingly, we show that using the same strategy introduced for piecewise-stationary combinatorial semi-bandits can achieve the same near-optimal dynamic regret bound as GLR-CUCB. Thus, we see that the key difficulty in balancing exploration-exploitation comes from the nonstationarity of the environment, rather than the censored feedback. Next, we study the adversarial graphical contextual bandit, a variant of adversarial multi-armed bandit. In this setting, the rewards of arms at each time step can be adversarially chosen, which is the extreme case of nonstationarity. Under this setting, the agent not only can observe the loss of the pulled arm, but also the loss of the neighbors of the pulled arm (side observation). The key difficulty is how to leverage the side observation to reduce the variance of the loss estimators for the unobserved arms. We propose novel loss estimators to incorporate the information of side observations, which leads to two new algorithms EXP3-LGC-U and EXP3-LGC-IX for undirected graphs and directed graphs, based on EXP3. Under mild conditions, these two algorithms can achieve rate-optimal regret. Next, we study exploration for nonstationary linear Markov decision processes. This is more challenging than bandit problems due to the credit assignment in long planning horizon. We propose two algorithms LSVI-UCB-Restart and Ada-LSVI-UCB-Restart, which are based on the principles of combining optimism in the face of uncertainty and passive/adaptive restart; their performance almost matches the minimax dynamic regret lower bound. Numerical experiments demonstrate the effectiveness of the proposed algorithms. Finally, we study active learning under the imbalanced dataset setting, where we try to minimize the labeling cost of training a machine learning model. We derive a new class-dependent generalization bound for active learning, which motivates the design of the proposed algorithm GBMAL. The key idea is to iteratively select the new batch data that approximately minimizes the generalization bound, which is equivalent to adding two additional regularization terms on top of existing active learning algorithms. Numerical experiments on various datasets show that GBMAL can reduce the labeling cost significantly compared to state-of-the-art active learning algorithms.

Graduation Semester

2022-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/117635

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Interactive learning in nonstationary environments

Zhou, Huozhi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In