Exploitable structures and complexities of modern nonconvex optimization: Fundamental limits and efficient algorithms

Zhang, Siqi

Exploitable structures and complexities of modern nonconvex optimization: Fundamental limits and efficient algorithms

Zhang, Siqi

Permalink

https://hdl.handle.net/2142/116105

Description

Title

Exploitable structures and complexities of modern nonconvex optimization: Fundamental limits and efficient algorithms

Author(s)

Zhang, Siqi

Issue Date

2022-07-14

Director of Research (if dissertation) or Advisor (if thesis)

He, Niao

Doctoral Committee Chair(s)

He, Niao

Committee Member(s)

Chen, Xin
Chen, Yuguo
Sun, Ruoyu

Department of Study

Industrial&Enterprise Sys Eng

Discipline

Industrial Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Nonconvex optimization
Minimax optimization
Oracle complexity
Generalization
Machine Learning

Abstract

Many modern machine learning applications are formulated as nonconvex optimization problems. Generally speaking, the study of nonconvex optimization comes with two perspectives: finding the fundamental computational limits and designing efficient algorithms, which corresponds to upper and lower complexity bounds in optimization theory. Recently the rapidly developing machine learning technology, e.g., adversarial training, meta-learning and deep learning, brings many new computational challenges. They are more difficult to solve due to that, for example, the objective function is no longer differentiable, or it is hard to get the true gradient or its unbiased estimator, which urges the development of new theory and efficient algorithms. The thesis studies several nonconvex optimization problems with special structures, which have broad applications in machine learning and other areas, the discussion covers both the upper and lower complexity bound perspectives. In the first part, we study the non-asymptotic stationarity convergence of Stochastic Mirror Descent (SMD) for nonsmooth nonconvex minimization in non-Euclidean settings. We focus on a general class of nonconvex nonsmooth stochastic optimization problems which consists of relatively weakly convex functions (possibly nonsmooth and non-Lipschitz) and a simple nonsmooth convex regularizer. We prove that, without the use of mini-batch, SMD is guaranteed to converge to a stationary point with an $ \mathcal{O}(\epsilon^{-4}) $ sample complexity. The efficiency estimate matches with existing results for stochastic subgradient method, while evaluated under a stronger stationarity measure. In the second part, we consider Conditional Stochastic Optimization (CSO) problems, which covers a variety of applications like invariant learning, causal inference and meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to its composition structure. We propose Biased Stochastic Gradient Descent (BSGD) algorithm and establish its sample complexities for general nonconvex problems. To improve the performance, we propose an accelerated algorithm called Biased SpiderBoost (BSpiderBoost) algorithm based on the recursive variance reduction technique with an improved sample complexity, which matches the corresponding lower complexity bound. We further conduct numerical experiments on several tasks to illustrate the performance of proposed algorithms. In the third part, we study lower complexity bounds of Nonconvex-Strongly-Concave (NC-SC) minimax optimization problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $\Omega(\sqrt{\kappa}\Delta L\epsilon^{-2})$ and $\Omega(n+\sqrt{n\kappa}\Delta L\epsilon^{-2})$ for the two settings, respectively, where $\kappa$ is the condition number, $L$ is the smoothness constant, and $\Delta$ is the initial gap. Our results reveals substantial gaps between these limits and best-known upper bounds in the literature. In the last part, we turn to investigate the generalization performances of nonconvex minimax optimization problems. Existing literature on this topic is restricted in terms that they generally requires a case-by-case study for each specific algorithm, also their analysis often resort to function value based measurement, which may not fit well the nonconvex structure. Here we initialize to study generalization performances measured by the stationarity of primal functions, and resort to the uniform convergence argument which provides generalization error independent of the choice of algorithms. Specifically, the sample complexities to achieve an $\epsilon$-uniform convergence and an $\epsilon$-generalization error are $\tilde{\mathcal{O}}\autopar{d\kappa^2\epsilon^{-2}}$ and $\tilde{\mathcal{O}}\autopar{d\epsilon^{-4}}$ for the NC-SC and NC-C settings, respectively. The results also help us to derive the gradient complexity for solving the population problems, which matches with existing SOTA literature.

Graduation Semester

2022-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/116105

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Exploitable structures and complexities of modern nonconvex optimization: Fundamental limits and efficient algorithms

Zhang, Siqi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In