Withdraw
Loading…
Exploitable structures and complexities of modern nonconvex optimization: Fundamental limits and efficient algorithms
Zhang, Siqi
Loading…
Permalink
https://hdl.handle.net/2142/116105
Description
- Title
- Exploitable structures and complexities of modern nonconvex optimization: Fundamental limits and efficient algorithms
- Author(s)
- Zhang, Siqi
- Issue Date
- 2022-07-14
- Director of Research (if dissertation) or Advisor (if thesis)
- He, Niao
- Doctoral Committee Chair(s)
- He, Niao
- Committee Member(s)
- Chen, Xin
- Chen, Yuguo
- Sun, Ruoyu
- Department of Study
- Industrial&Enterprise Sys Eng
- Discipline
- Industrial Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Nonconvex optimization
- Minimax optimization
- Oracle complexity
- Generalization
- Machine Learning
- Abstract
- Many modern machine learning applications are formulated as nonconvex optimization problems. Generally speaking, the study of nonconvex optimization comes with two perspectives: finding the fundamental computational limits and designing efficient algorithms, which corresponds to upper and lower complexity bounds in optimization theory. Recently the rapidly developing machine learning technology, e.g., adversarial training, meta-learning and deep learning, brings many new computational challenges. They are more difficult to solve due to that, for example, the objective function is no longer differentiable, or it is hard to get the true gradient or its unbiased estimator, which urges the development of new theory and efficient algorithms. The thesis studies several nonconvex optimization problems with special structures, which have broad applications in machine learning and other areas, the discussion covers both the upper and lower complexity bound perspectives. In the first part, we study the non-asymptotic stationarity convergence of Stochastic Mirror Descent (SMD) for nonsmooth nonconvex minimization in non-Euclidean settings. We focus on a general class of nonconvex nonsmooth stochastic optimization problems which consists of relatively weakly convex functions (possibly nonsmooth and non-Lipschitz) and a simple nonsmooth convex regularizer. We prove that, without the use of mini-batch, SMD is guaranteed to converge to a stationary point with an $ \mathcal{O}(\epsilon^{-4}) $ sample complexity. The efficiency estimate matches with existing results for stochastic subgradient method, while evaluated under a stronger stationarity measure. In the second part, we consider Conditional Stochastic Optimization (CSO) problems, which covers a variety of applications like invariant learning, causal inference and meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to its composition structure. We propose Biased Stochastic Gradient Descent (BSGD) algorithm and establish its sample complexities for general nonconvex problems. To improve the performance, we propose an accelerated algorithm called Biased SpiderBoost (BSpiderBoost) algorithm based on the recursive variance reduction technique with an improved sample complexity, which matches the corresponding lower complexity bound. We further conduct numerical experiments on several tasks to illustrate the performance of proposed algorithms. In the third part, we study lower complexity bounds of Nonconvex-Strongly-Concave (NC-SC) minimax optimization problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $\Omega(\sqrt{\kappa}\Delta L\epsilon^{-2})$ and $\Omega(n+\sqrt{n\kappa}\Delta L\epsilon^{-2})$ for the two settings, respectively, where $\kappa$ is the condition number, $L$ is the smoothness constant, and $\Delta$ is the initial gap. Our results reveals substantial gaps between these limits and best-known upper bounds in the literature. In the last part, we turn to investigate the generalization performances of nonconvex minimax optimization problems. Existing literature on this topic is restricted in terms that they generally requires a case-by-case study for each specific algorithm, also their analysis often resort to function value based measurement, which may not fit well the nonconvex structure. Here we initialize to study generalization performances measured by the stationarity of primal functions, and resort to the uniform convergence argument which provides generalization error independent of the choice of algorithms. Specifically, the sample complexities to achieve an $\epsilon$-uniform convergence and an $\epsilon$-generalization error are $\tilde{\mathcal{O}}\autopar{d\kappa^2\epsilon^{-2}}$ and $\tilde{\mathcal{O}}\autopar{d\epsilon^{-4}}$ for the NC-SC and NC-C settings, respectively. The results also help us to derive the gradient complexity for solving the population problems, which matches with existing SOTA literature.
- Graduation Semester
- 2022-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Siqi Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…