Withdraw
Loading…
Communication-centric cross-stack acceleration for distributed machine learning
Li, Youjie
Loading…
Permalink
https://hdl.handle.net/2142/117774
Description
- Title
- Communication-centric cross-stack acceleration for distributed machine learning
- Author(s)
- Li, Youjie
- Issue Date
- 2022-11-23
- Director of Research (if dissertation) or Advisor (if thesis)
- Kim, Nam Sung
- Doctoral Committee Chair(s)
- Kim, Nam Sung
- Committee Member(s)
- Hwu, Wen-mei
- Torrellas, Josep
- Chen, Deming
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Distributed Machine Learning
- Distributed Training
- Parallel Computing
- In-Network Computing
- Smart NICs
- Programmable Switches
- Pipelined SGD
- Machine Learning Framework
- Abstract
- Distributed training has been the Holy Grail in machine learning systems, as it is an indispensable technique for addressing the ever-growing computation demands and memory requirements due to the unprecedented scaling of model sizes and data volumes. However, even distributed training takes inordinate time, of which a large fraction is paid for communication overhead in either inter-server networks or intra-server interconnects. In this dissertation, we propose cross-stack solutions that span hardware, software, and algorithms to accelerate and scale distributed training systems. First, we propose in-network computing by leveraging novel network hardware in modern data-centers, such as programmable network interface cards and switches, for not only compressing traffic volume in real time but also reducing network hops on the fly. Second, we present new algorithms surrounding efficient communication, such as gradient compression and pipelined computation with communication, for further shrinking the network overhead while maintaining the training convergence and model accuracy. Third, we develop a next-generation machine learning framework by novel schemes of task decomposition and late binding, to train massive models even without sufficient memory while drastically reducing communication overhead within a server's interconnects.
- Graduation Semester
- 2022-12
- Type of Resource
- Thesis
- Copyright and License Information
- In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…