Communication-centric cross-stack acceleration for distributed machine learning

Li, Youjie

Communication-centric cross-stack acceleration for distributed machine learning

Li, Youjie

Content Files

LI-DISSERTATION-2022.pdf

Permalink

https://hdl.handle.net/2142/117774

Description

Title

Communication-centric cross-stack acceleration for distributed machine learning

Author(s)

Li, Youjie

Issue Date

2022-11-23

Director of Research (if dissertation) or Advisor (if thesis)

Kim, Nam Sung

Doctoral Committee Chair(s)

Kim, Nam Sung

Committee Member(s)

Hwu, Wen-mei
Torrellas, Josep
Chen, Deming

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Distributed Machine Learning
Distributed Training
Parallel Computing
In-Network Computing
Smart NICs
Programmable Switches
Pipelined SGD
Machine Learning Framework

Abstract

Distributed training has been the Holy Grail in machine learning systems, as it is an indispensable technique for addressing the ever-growing computation demands and memory requirements due to the unprecedented scaling of model sizes and data volumes. However, even distributed training takes inordinate time, of which a large fraction is paid for communication overhead in either inter-server networks or intra-server interconnects. In this dissertation, we propose cross-stack solutions that span hardware, software, and algorithms to accelerate and scale distributed training systems. First, we propose in-network computing by leveraging novel network hardware in modern data-centers, such as programmable network interface cards and switches, for not only compressing traffic volume in real time but also reducing network hops on the fly. Second, we present new algorithms surrounding efficient communication, such as gradient compression and pipelined computation with communication, for further shrinking the network overhead while maintaining the training convergence and model accuracy. Third, we develop a next-generation machine learning framework by novel schemes of task decomposition and late binding, to train massive models even without sufficient memory while drastically reducing communication overhead within a server's interconnects.

Graduation Semester

2022-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/117774

Copyright and License Information

In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Communication-centric cross-stack acceleration for distributed machine learning

Li, Youjie

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In