Accelerating distributed neural network training with network-centric approach

Yuan, Yifan

Accelerating distributed neural network training with network-centric approach

Yuan, Yifan

Permalink

https://hdl.handle.net/2142/106434

Description

Title

Accelerating distributed neural network training with network-centric approach

Author(s)

Yuan, Yifan

Issue Date

2019-10-17

Director of Research (if dissertation) or Advisor (if thesis)

Kim, Nam Sung

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2020-03-02T22:38:39Z

Keyword(s)

distributed training
accelerator

Abstract

Distributed training of Deep Neural Networks (DNN) is an important technique to reduce the training time of large DNNs for a wide range of applications. In existing distributed training approaches, however, the communication time to periodically exchange parameters (i.e., weights) and gradients among computer nodes over the network constitutes a large fraction of the total training time. To reduce the communication time, we propose an algorithm/hardware co-design, INCEPTIONN. More specifically, observing that gradients are much more tolerant to precision loss than parameters, we first propose a gradient-centric distributed training algorithm. As designed to exchange only gradients among nodes in a distributed manner, it can transfer less information, better overlap communication with computation, and apply a more aggressive lossy compression algorithm to all the information exchanged among nodes than traditional distributed algorithms. Second, exploiting unique characteristics of gradient values, we propose a lossy compression algorithm, optimized for compressing gradients. It accomplishes high compression ratios for compressing gradients without notably affecting the accuracy of trained DNNs. Lastly, we demonstrate that compression algorithms consume a large amount of CPU time, which in turn increases total training time albeit reduced communication time. To tackle this, we propose an in-network computing approach that delegates the lossy compression task to hardware integrated with a Network Interface Card (NIC). Our experiments show that INCEPTIONN can reduce a large portion of the communication time and thus the training time of DNNs, with little degradation in accuracy of trained DNNs.

Graduation Semester

2019-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/106434

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Accelerating distributed neural network training with network-centric approach

Yuan, Yifan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In