Towards faster algorithms for large-scale constrained and min-max problems

Xiao, Peijun

Towards faster algorithms for large-scale constrained and min-max problems

Xiao, Peijun

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/115712

Description

Title

Towards faster algorithms for large-scale constrained and min-max problems

Author(s)

Xiao, Peijun

Issue Date

2022-04-19

Director of Research (if dissertation) or Advisor (if thesis)

Sun, Ruoyu

Doctoral Committee Chair(s)

Sun, Ruoyu

Committee Member(s)

Hu, Bin
Beck, Carolyn
He, Niao
Wang, Qiong

Department of Study

Industrial&Enterprise Sys Eng

Discipline

Industrial Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Optimization, Complexity

Abstract

This thesis aims to analyze and develop fast algorithms for solving large-scale constrained and min-max problems. We will focus on three settings: block decomposition methods for unconstrained and linearly constrained problems; single-loop algorithms for min-max optimization; compressed gradient-based methods for unconstrained and min-max problems. In the first setting, we study block decomposition methods. Block decomposition is one of the most popular ideas in solving large-scale problems. At each iteration, we decompose the variable vector into multiple blocks and update a single block with other blocks fixed. The blocks can be updated according to certain orders, either deterministic or random. There are at least two classes of deterministic update orders: symmetric and nonsymmetric. A basic nonsymmetric update order is the cyclic order, which updates the block sequentially. Recently, coordinate descent with cyclic order has been shown to be $ O(n^2)$ times slower than randomized versions in the worst-case, where $n$ is the number of blocks. A natural question is whether the existing symmetrized deterministic update orders can achieve faster convergence rates than the cyclic order or even as good as the randomized versions. This question is valuable as it is not always feasible or easy to select blocks in random order for large-scale problems. In this thesis, we provide a negative answer to the question by proving that for unconstrained problems, coordinate descent with two deterministic symmetric update orders can be $\mathcal{O}(n^2)$ times slower than randomized coordinate descent. Besides coordinate descent, we also study the constrained optimization and a popular block-decomposition algorithm Alternating Direction Method of Multipliers (ADMM). We empirically demonstrate that the convergence speed of ADMM with these two symmetric update orders can be roughly $ n^2$ times slower than randomly permuted ADMM. After studying unconstrained and constrained problems, we switch our focus to a more general class of problems: min-max problems. Min-max optimization problems arise in many machine learning applications, such as robust training and fairness training. Nevertheless, it is still largely unclear what algorithms are the most efficient ones for large-scale min-max problems. Our goal is to develop algorithms with fast convergence rates for solving large-scale min-max problems. The classical gradient descent ascent (GDA) algorithm exhibits oscillation and are not guaranteed to converge when the inner problem is not strongly convex. We introduce a ``smoothing" scheme combined with GDA to stabilize the oscillation and ensure convergence to stationary solutions. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. This is the first result for achieving $O(1/\epsilon^2)$ for this important class of nonconvex-concave problems. We also show that the proposed algorithm achieves an $O(1/\epsilon^4)$ iteration complexity in solving general nonconvex-concave problems, which matches the best existing iteration complexity for general problems. We further extend the algorithm to multi-block case and illustrate the practical efficiency of the proposed algorithm on robust training. Finally, we study distributed optimization algorithms where we using multiple machines to solve a single problem.There is a surge of interest in distributed optimization in machine learning recently \citep{recht2011hogwild, dean2012large}. One major challenge in distributed optimization is the limited bandwidth. A natural solution is to compress the communicated messages, which leads to compressed distributed algorithms. Nevertheless, compressing messages may lead to slower convergence of the algorithm. Thus, we hope to design compressed distributed algorithms that can achieve the same convergence speed as the non-compressed counterparts. Another important consideration is to find a compression method that works for a broad class of problems, including unconstrained and min-max problems. So far, most existing compressed distributed algorithms either cannot preserve the convergence speed or cannot apply to min-max problems. We observe that controlling the effective error at the central server can preserve the convergence speed, yet this idea is universal enough. Based on the observation, we propose a class of compressed distributed algorithms that achieve the same convergence rates as the non-compressed counterparts for solving unconstrained problems and min-max problems. More specifically, for unconstrained problems we analyze strongly convex and nonconvex problems, and for min-max problems we analyze nonconvex-strongly-concave setting.

Graduation Semester

2022-05

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Towards faster algorithms for large-scale constrained and min-max problems

Xiao, Peijun

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In