Communication efficient large scale distributed optimization with curvature acceleration

Li, Yichuan

Communication efficient large scale distributed optimization with curvature acceleration

Li, Yichuan

Permalink

https://hdl.handle.net/2142/116050

Description

Title

Communication efficient large scale distributed optimization with curvature acceleration

Author(s)

Li, Yichuan

Issue Date

2022-07-11

Director of Research (if dissertation) or Advisor (if thesis)

Freris, Nikolaos M.
Voulgaris, Petros G.

Doctoral Committee Chair(s)

Hovakimyan, Naira

Committee Member(s)

Stipanovic, Dusan M.
Salapaka, Srinivasa M.

Department of Study

Mechanical Sci & Engineering

Discipline

Mechanical Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Decentralized optimization
large scale machine learning
Federated Learning

Abstract

Distributed optimization has proven to be an effective scheme to tackle numerous emerging problems that are surged by the development of digital technologies and mobile smart devices. Examples include control of robot swarms, distributed and large scale machine learning, estimation in sensor networks, and management of smart grids. In such setups, the individual interests of agents (machines) have to be collectively optimized under stringent restrictions on communication bandwidth, computation resources, and security concerns. To balance the efforts on optimizing local objectives and cooperation among the network is challenging while necessary to achieve a global solution. Many endeavors have been made to design efficient algorithms that can harness the computation power of a heterogeneous network, in terms of both machine hardware and data distribution. The majority of existing algorithms can be categorized as first-order methods, where the computation process is guided only by the objective (sub) gradient information. Such practices have found many successes due to their economical computation costs and uninvolved implementation. Nevertheless, first-order algorithms suffer from a common drawback: slow convergence rate, especially when the objective curvature is skewed. This induces large overall iteration numbers and prevents the implementation of first order algorithms to applications where accurate solutions are sought in few rounds, such as real time computation on embedded systems and communication expensive learning practices. In this dissertation, we shift our attention to second-order methods that aim to accelerate the iterative process through use of objective curvature. The main challenge of designing second-order methods in large scale networks is threefold: (i) distributed implementation, (ii) computation and communication balance, and (iii) network synchronization. These challenges are not unique to devising decentralized second-order methods, but become more pronounced due to the involvement of Hessian matrices, solving linear systems, backtracking line search, and approximation of curvature information etc. We address these challenges in several steps. Chapter \ref{chapter_acc} tackles the distributed convex composite problem by presenting two algorithms that distributedly approximate the curvature information from the perspective of Method of Multipliers and Alternating Direction Method of Multipliers. By increasing the accuracy of approximation, we show both theoretically and empirically that the proposed algorithms achieve faster convergence rate, and thus providing a tunable adjustment between convergence speed and algorithm expenditures. Chapter \ref{chapter_druid} presents a unified framework for designing algorithms that can harness curvature information with communication efficiency. Several algorithms are designed and analyzed in a unified fashion, including computation schemes using gradient descent, Newton method, and BFGS. Their differences and advantages are discussed, analyzed, and demonstrated. This offers significant flexibility for agents in choosing updating schemes that are most suitable for their local environments. Chapter \ref{chapter_asy} further extends the previous framework to the asynchronous setting, where agents participate in the computation without coordination. This eliminates the need for network wide synchronization among agents. The proposed framework imposes minimal assumptions in terms of the participating frequency and asynchrony between agents, which further broadens the applicability of the proposed algorithms. Chapter \ref{chapter_FL} studies an emerging topic termed Federated Learning, which embodies the challenges we considered in previous chapters. Namely, stringent requirement on communication costs in terms of both frequencies and message sizes, highly heterogeneous networks with massive number of agents with drastically different local environments, non-i.i.d data distributions with imbalanced data volumes, and strict limits on privacy. We present a novel algorithm that accommodates all these difficulties, and show that it achieves optimal algorithmic complexity. Extensive experiments are conducted to demonstrate its superiority over existing methods.

Graduation Semester

2022-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Communication efficient large scale distributed optimization with curvature acceleration

Li, Yichuan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In