Information-theoretic bounds in learning algorithms

Bu, Yuheng

Information-theoretic bounds in learning algorithms

Bu, Yuheng

Permalink

https://hdl.handle.net/2142/106140

Description

Title

Information-theoretic bounds in learning algorithms

Author(s)

Bu, Yuheng

Issue Date

2019-08-14

Director of Research (if dissertation) or Advisor (if thesis)

Veeravalli, Venugopal V.

Doctoral Committee Chair(s)

Veeravalli, Venugopal V.

Committee Member(s)

Raginsky, Maxim
Varshney, Lav R.
Small, Kevin

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

active and adaptive learning
model change detection
mutual information based generalization bounds
model compression

Abstract

The focus of this thesis is on understanding machine learning algorithms from an information-theoretic point of view. More specifically, we apply information-theoretic tools to construct performance bounds for the learning algorithms, with the goal of deepening the understanding of current algorithms and inspiring new learning techniques. The first problem considered involves a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, an active and adaptive learning framework is proposed, in which the labels of the most informative samples are actively queried from an unlabeled data pool, and the adaptation to the change is achieved by utilizing the information acquired in previous steps. The goal is to satisfy a pre-specified bound on the excess risk at each time-step. More specifically, the design of the active querying algorithm is based on minimizing the excess risk using stochastic gradient descent in the maximum likelihood estimation setting. Our algorithm and theoretical results are validated by experiments with synthetic and real data. To determine whether the active and adaptive learning framework is applicable in practice, we then study the problem of model change detection. There are two sets of samples that are generated according to a pre-change probabilistic model with parameter theta, and a post-change model with parameter theta', respectively. The goal is to detect whether the change in the model is significant. We construct an empirical difference test (EDT), which has low computational complexity. Moreover, we provide an approximation method to set the threshold of the EDT to meet the false alarm constraint. Experiments with linear regression and logistic regression are conducted to validate the proposed algorithms. Another key contribution of this thesis is in the area of mutual information-based generalization error bounds of supervised learning algorithms. Our bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm, which requires weaker conditions on the loss function, and provides a tighter characterization of the generalization error than existing studies. Examples are further provided to demonstrate that the proposed bound is tighter and has a broader range of applicability. Finally, an application of this mutual information-based generalization error bound is considered. We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces the mutual information-based generalization error bound; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. We show through a linear regression example that such an improvement in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.

Graduation Semester

2019-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/106140

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Information-theoretic bounds in learning algorithms

Bu, Yuheng

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In