The implicit bias of gradient descent: from linear classifiers to deep networks

Ji, Ziwei

The implicit bias of gradient descent: from linear classifiers to deep networks

Ji, Ziwei

Permalink

https://hdl.handle.net/2142/115475

Description

Title

The implicit bias of gradient descent: from linear classifiers to deep networks

Author(s)

Ji, Ziwei

Issue Date

2022-04-18

Director of Research (if dissertation) or Advisor (if thesis)

Telgarsky, Matus

Doctoral Committee Chair(s)

Telgarsky, Matus

Committee Member(s)

Chekuri, Chandra
Raginsky, Maxim
Hsu, Daniel

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Implicit bias
test error bounds
gradient descent
stochastic gradient descent
primal-dual analysis

Abstract

Gradient descent and other first-order methods have been extensively used in deep learning. They can find solutions with not only low training error, but also low test error. This motivates the study of the implicit bias of training algorithms such as gradient descent, meaning we want to understand what special properties are satisfied by gradient descent solutions that lead to good generalization. In this thesis, we focus on gradient descent and its derivatives, show test error bounds and characterize the implicit biases. In detail, for linear classifiers, we consider both the separable setting and nonseparable setting. In the separable case, we first give an 1/t test error bound for stochastic gradient descent. Then for gradient descent, we present a primal-dual framework to analyze its implicit bias, which leads to fast margin maximization algorithms. While the previous results mostly require exponentially-tailed losses, we also show that for general decreasing losses, the implicit bias can still be characterized in terms of the regularization path. In the nonseparable case, we design a nearly-optimal algorithm by combining logistic regression and perceptron. We also characterize the implicit bias of gradient descent via a unique decomposition of the training set. For neural networks, we first provide test error bounds for shallow ReLU networks, using the recent idea of neural tangent kernel, but only requiring a mild overparameterization. Then we show implicit bias results for deep linear networks and deep homogeneous networks, in the form of alignment properties.

Graduation Semester

2022-05

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/115475

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

The implicit bias of gradient descent: from linear classifiers to deep networks

Ji, Ziwei

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In