Withdraw
Loading…
Information-theoretic bounds in learning algorithms
Bu, Yuheng
Loading…
Permalink
https://hdl.handle.net/2142/106140
Description
- Title
- Information-theoretic bounds in learning algorithms
- Author(s)
- Bu, Yuheng
- Issue Date
- 2019-08-14
- Director of Research (if dissertation) or Advisor (if thesis)
- Veeravalli, Venugopal V.
- Doctoral Committee Chair(s)
- Veeravalli, Venugopal V.
- Committee Member(s)
- Raginsky, Maxim
- Varshney, Lav R.
- Small, Kevin
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- active and adaptive learning
- model change detection
- mutual information based generalization bounds
- model compression
- Abstract
- The focus of this thesis is on understanding machine learning algorithms from an information-theoretic point of view. More specifically, we apply information-theoretic tools to construct performance bounds for the learning algorithms, with the goal of deepening the understanding of current algorithms and inspiring new learning techniques. The first problem considered involves a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, an active and adaptive learning framework is proposed, in which the labels of the most informative samples are actively queried from an unlabeled data pool, and the adaptation to the change is achieved by utilizing the information acquired in previous steps. The goal is to satisfy a pre-specified bound on the excess risk at each time-step. More specifically, the design of the active querying algorithm is based on minimizing the excess risk using stochastic gradient descent in the maximum likelihood estimation setting. Our algorithm and theoretical results are validated by experiments with synthetic and real data. To determine whether the active and adaptive learning framework is applicable in practice, we then study the problem of model change detection. There are two sets of samples that are generated according to a pre-change probabilistic model with parameter theta, and a post-change model with parameter theta', respectively. The goal is to detect whether the change in the model is significant. We construct an empirical difference test (EDT), which has low computational complexity. Moreover, we provide an approximation method to set the threshold of the EDT to meet the false alarm constraint. Experiments with linear regression and logistic regression are conducted to validate the proposed algorithms. Another key contribution of this thesis is in the area of mutual information-based generalization error bounds of supervised learning algorithms. Our bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm, which requires weaker conditions on the loss function, and provides a tighter characterization of the generalization error than existing studies. Examples are further provided to demonstrate that the proposed bound is tighter and has a broader range of applicability. Finally, an application of this mutual information-based generalization error bound is considered. We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces the mutual information-based generalization error bound; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. We show through a linear regression example that such an improvement in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.
- Graduation Semester
- 2019-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/106140
- Copyright and License Information
- Copyright 2019 Yuheng Bu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…