Withdraw
Loading…
The implicit bias of gradient descent: from linear classifiers to deep networks
Ji, Ziwei
Loading…
Permalink
https://hdl.handle.net/2142/115475
Description
- Title
- The implicit bias of gradient descent: from linear classifiers to deep networks
- Author(s)
- Ji, Ziwei
- Issue Date
- 2022-04-18
- Director of Research (if dissertation) or Advisor (if thesis)
- Telgarsky, Matus
- Doctoral Committee Chair(s)
- Telgarsky, Matus
- Committee Member(s)
- Chekuri, Chandra
- Raginsky, Maxim
- Hsu, Daniel
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Implicit bias
- test error bounds
- gradient descent
- stochastic gradient descent
- primal-dual analysis
- Abstract
- Gradient descent and other first-order methods have been extensively used in deep learning. They can find solutions with not only low training error, but also low test error. This motivates the study of the implicit bias of training algorithms such as gradient descent, meaning we want to understand what special properties are satisfied by gradient descent solutions that lead to good generalization. In this thesis, we focus on gradient descent and its derivatives, show test error bounds and characterize the implicit biases. In detail, for linear classifiers, we consider both the separable setting and nonseparable setting. In the separable case, we first give an 1/t test error bound for stochastic gradient descent. Then for gradient descent, we present a primal-dual framework to analyze its implicit bias, which leads to fast margin maximization algorithms. While the previous results mostly require exponentially-tailed losses, we also show that for general decreasing losses, the implicit bias can still be characterized in terms of the regularization path. In the nonseparable case, we design a nearly-optimal algorithm by combining logistic regression and perceptron. We also characterize the implicit bias of gradient descent via a unique decomposition of the training set. For neural networks, we first provide test error bounds for shallow ReLU networks, using the recent idea of neural tangent kernel, but only requiring a mild overparameterization. Then we show implicit bias results for deep linear networks and deep homogeneous networks, in the form of alignment properties.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Ziwei Ji
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…