Withdraw
Loading…
Towards faster algorithms for large-scale constrained and min-max problems
Xiao, Peijun
Loading…
Permalink
https://hdl.handle.net/2142/115712
Description
- Title
- Towards faster algorithms for large-scale constrained and min-max problems
- Author(s)
- Xiao, Peijun
- Issue Date
- 2022-04-19
- Director of Research (if dissertation) or Advisor (if thesis)
- Sun, Ruoyu
- Doctoral Committee Chair(s)
- Sun, Ruoyu
- Committee Member(s)
- Hu, Bin
- Beck, Carolyn
- He, Niao
- Wang, Qiong
- Department of Study
- Industrial&Enterprise Sys Eng
- Discipline
- Industrial Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Optimization, Complexity
- Abstract
- This thesis aims to analyze and develop fast algorithms for solving large-scale constrained and min-max problems. We will focus on three settings: block decomposition methods for unconstrained and linearly constrained problems; single-loop algorithms for min-max optimization; compressed gradient-based methods for unconstrained and min-max problems. In the first setting, we study block decomposition methods. Block decomposition is one of the most popular ideas in solving large-scale problems. At each iteration, we decompose the variable vector into multiple blocks and update a single block with other blocks fixed. The blocks can be updated according to certain orders, either deterministic or random. There are at least two classes of deterministic update orders: symmetric and nonsymmetric. A basic nonsymmetric update order is the cyclic order, which updates the block sequentially. Recently, coordinate descent with cyclic order has been shown to be $ O(n^2)$ times slower than randomized versions in the worst-case, where $n$ is the number of blocks. A natural question is whether the existing symmetrized deterministic update orders can achieve faster convergence rates than the cyclic order or even as good as the randomized versions. This question is valuable as it is not always feasible or easy to select blocks in random order for large-scale problems. In this thesis, we provide a negative answer to the question by proving that for unconstrained problems, coordinate descent with two deterministic symmetric update orders can be $\mathcal{O}(n^2)$ times slower than randomized coordinate descent. Besides coordinate descent, we also study the constrained optimization and a popular block-decomposition algorithm Alternating Direction Method of Multipliers (ADMM). We empirically demonstrate that the convergence speed of ADMM with these two symmetric update orders can be roughly $ n^2$ times slower than randomly permuted ADMM. After studying unconstrained and constrained problems, we switch our focus to a more general class of problems: min-max problems. Min-max optimization problems arise in many machine learning applications, such as robust training and fairness training. Nevertheless, it is still largely unclear what algorithms are the most efficient ones for large-scale min-max problems. Our goal is to develop algorithms with fast convergence rates for solving large-scale min-max problems. The classical gradient descent ascent (GDA) algorithm exhibits oscillation and are not guaranteed to converge when the inner problem is not strongly convex. We introduce a ``smoothing" scheme combined with GDA to stabilize the oscillation and ensure convergence to stationary solutions. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. This is the first result for achieving $O(1/\epsilon^2)$ for this important class of nonconvex-concave problems. We also show that the proposed algorithm achieves an $O(1/\epsilon^4)$ iteration complexity in solving general nonconvex-concave problems, which matches the best existing iteration complexity for general problems. We further extend the algorithm to multi-block case and illustrate the practical efficiency of the proposed algorithm on robust training. Finally, we study distributed optimization algorithms where we using multiple machines to solve a single problem.There is a surge of interest in distributed optimization in machine learning recently \citep{recht2011hogwild, dean2012large}. One major challenge in distributed optimization is the limited bandwidth. A natural solution is to compress the communicated messages, which leads to compressed distributed algorithms. Nevertheless, compressing messages may lead to slower convergence of the algorithm. Thus, we hope to design compressed distributed algorithms that can achieve the same convergence speed as the non-compressed counterparts. Another important consideration is to find a compression method that works for a broad class of problems, including unconstrained and min-max problems. So far, most existing compressed distributed algorithms either cannot preserve the convergence speed or cannot apply to min-max problems. We observe that controlling the effective error at the central server can preserve the convergence speed, yet this idea is universal enough. Based on the observation, we propose a class of compressed distributed algorithms that achieve the same convergence rates as the non-compressed counterparts for solving unconstrained problems and min-max problems. More specifically, for unconstrained problems we analyze strongly convex and nonconvex problems, and for min-max problems we analyze nonconvex-strongly-concave setting.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Peijun Xiao
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…