Learning from expert advice framework: Algorithms and applications

Truong, Anh

Learning from expert advice framework: Algorithms and applications

Truong, Anh

Permalink

https://hdl.handle.net/2142/100961

Description

Title

Learning from expert advice framework: Algorithms and applications

Author(s)

Truong, Anh

Issue Date

2018-04-13

Director of Research (if dissertation) or Advisor (if thesis)

Kiyavash, Negar

Doctoral Committee Chair(s)

Kiyavash, Negar

Committee Member(s)

Sreenivas, Ramavarapu S.
Do, Minh
Marla, Lavanya
Etesami, S. Rasoul

Department of Study

Industrial&Enterprise Sys Eng

Discipline

Industrial Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Learning with expert advice, sleeping expert, adversarial strategies, malicious experts, MDP, efficient labeling, simple regret

Abstract

"Online recommendation systems have been widely used by retailers, digital marketing, and especially in e-commerce applications. Popular sites such as Netflix and Amazon suggest movies or general merchandise to their clients based on recommendations from peers. At core of recommendation systems resides a prediction algorithm, which based on recommendations received from a set of experts (users), recommends objects to other users. After a user ``consumes"" an object, his feedback provided to the system is used to assess the performance of experts at that round and adjust the predictions of the recommendation system for the future rounds. This so-called ``learning from expert advice'' framework has been extensively studied in the literature. In this dissertation, we investigate various settings and applications ranging from partial information, adversarial scenarios, to limited resources. We propose provable algorithms for such systems, along with theoretical and experimental results. In the first part of the thesis, we focus our attention to a generalized model of learning from expert advice in which experts could abstain from participating at some rounds. Our proposed online algorithm falls into the class of weighted average predictors and uses a time varying multiplicative weight update rule. This update rule changes the weight of an expert based on his relative performance compared to the average performance of available experts at the current round. We prove the convergence of our algorithm to the best expert, defined in terms of both availability and accuracy, in the stochastic setting. Next, we study the optimal adversarial strategies against the weighted average prediction algorithm. All but one expert are honest and the malicious expert's goal is to sabotage the performance of the algorithm by strategically providing dishonest recommendations. We formulate the problem as a Markov decision process (MDP) and apply policy iteration to solve it. For the logarithmic loss, we prove that the optimal strategy for the adversary is the greedy policy, whereas for the absolute loss, in the $2$-experts, discounted cost setting, we prove that the optimal strategy is a threshold policy. We extend the results to the infinite horizon problem and find the exact thresholds for the stationary optimal policy. As an effort to investigate the extended problem, we use a mean field approach in the $N$-experts setting to find the optimal strategy when the predictions of the honest experts are i.i.d. In addition to designing an effective weight update rule and investigating optimal strategies of malicious experts, we also consider active learning applications for learning with expert advice framework. In this application, the target is to reduce the number of labeling while still keeping the regret bound as small as possible. We proposed two algorithms, EPSL and EPAL, which are able to efficiently request label for each object. In essence, the idea of two algorithms is to examine the opinion ranges of experts, and decide to acquire labels based on the maximum difference of those opinion using a randomized policy. Both algorithms obtain nearly optimal regret bound up to some constant depending on the characteristics of experts' predictions. Last but not least, we turn our attention to the generalized ``best arm identification"" problem in which, at each time, there is a subset of products whose rewards or profits are unknown (but follow some fixed distributions), and the goal is to select the best product to recommend to users after trying on a number of sampling. We propose UCB based (Upper Confidence Bound) algorithms that provide flexible parameter tuning based on the availability of each arm in the collection. We also propose a simple, yet efficient, uniform sampling algorithm for this problem. We proved that, for these algorithms, the error of selecting the incorrect arm decays exponentially over time."

Graduation Semester

2018-05

Type of Resource

text

Permalink

http://hdl.handle.net/2142/100961

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Learning from expert advice framework: Algorithms and applications

Truong, Anh

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Industrial and Enterprise Systems Engineering

Log In