On the characterization of the global landscape of neural networks

Li, Dawei

On the characterization of the global landscape of neural networks

Li, Dawei

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/115894

Description

Title

On the characterization of the global landscape of neural networks

Author(s)

Li, Dawei

Issue Date

2022-07-12

Director of Research (if dissertation) or Advisor (if thesis)

Sun, Ruoyu

Doctoral Committee Chair(s)

Sun, Ruoyu

Committee Member(s)

Chen, Xin
Srikant, Rayadurgam
Etesami, Seyed Rasoul

Department of Study

Industrial&Enterprise Sys Eng

Discipline

Industrial Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Landscape
Neural Network
Deep Learning
Non-convex Optimization

Abstract

Understanding why deep neural networks perform well has attracted much attention recently. The non-convexity of the associated loss functions, which may cause a bad landscape, is one of the major concerns for neural network training, but the recent success of neural networks suggests that their loss landscape is not too bad. Nevertheless, a systematic characterization of the landscape is still yet to be done. In this thesis, we aim at a more complete understanding of the global landscape of neural networks. In the first part, we study the existence of sub-optimal local minima for multi-layer networks. In particular, we prove that for neural networks with generic input data and smooth nonlinear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons). This result overturns a classical result claiming that ``there exists no sub-optimal local minimum for 1-hidden-layer wide neural nets with sigmoid activation function''. Moreover, it indicates that sub-optimal local minima are common for wide neural nets. Given that we cannot eliminate sub-optimal local minima for neural networks, a natural question is: what is the true landscape of neural networks? Specifically, does width affect the landscape? In the second part, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub-optimal basin, where ``basin'' is defined as the set-wise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not globally optimal. These two results together show the phase transition in landscape from narrow to wide networks and indicate the benefit of width as well. In the last part, we move on to explore how the previously mentioned phase transition occurs via the ``generative mechanism'' of stationary points. We study a certain transformation called ``neuron splitting'', which maps a stationary point in narrower networks into stationary points in wider networks. We provide sufficient conditions on which the stationary points of the wider networks are local minima or saddle points: under certain conditions, a local minimum is mapped to a high-dimensional plateau that contains both local minima and saddles of an arbitrarily wide network, while any saddle points can only be mapped to saddle points of wider networks by neuron splitting. These results altogether characterize the properties of stationary points in neural networks: the existence in different settings, the location and shape, as well as the evolution of stationary points when restructuring the neural networks. They not only provide a deeper understanding of the success of the current wide neural networks, but also propose potential methods to tackle the difficulties in training smaller neural networks.

Graduation Semester

2022-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

On the characterization of the global landscape of neural networks

Li, Dawei

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In