Comparison of distributed training architecture for convolutional neural network in cloud

Shi, Dongwei

Comparison of distributed training architecture for convolutional neural network in cloud

Shi, Dongwei

Permalink

https://hdl.handle.net/2142/105276

Description

Title: Comparison of distributed training architecture for convolutional neural network in cloud
Author(s): Shi, Dongwei
Issue Date: 2019-04-26
Director of Research (if dissertation) or Advisor (if thesis): Hwu, Wen-Mei
Department of Study: Electrical & Computer Eng
Discipline: Electrical & Computer Engr
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree Name: M.S.
Degree Level: Thesis
Keyword(s): Deep Learning, Distributed System
Abstract: The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have enabled breakthroughs in various artificial intelligence fields such as computer vision, natural language processing and data mining. The training process of the DNN is a computationally intensive application that can be accelerated by parallel computing devices such as graphic processing units (GPUs) and field programmable gate arrays (FP- GAs). However, sometimes the amount of the training data or the size of the model might exceed what can be efficiently trained or loaded by a single machine. Distributed deep learning training addresses this issue by spreading the computations over several machines. Due to the internode communication and other overheads in distributed computing infrastructure, the performance improvements are not directly proportional to the number of machines. This thesis will study the computation time, memory, bandwidth, and other resources that are required to perform distributed deep learning. The approach of this work is to implement and deploy several data parallelism distributed deep learning algorithms on Google Cloud Platform (GCP) and then to analyze the performance and compare the communication over- head between different algorithms. The results obtained in this research yield the Ring All-Reduce architecture, a bandwidth-optimal communication operation used for distributed deep learning, which outperforms the Parameter Server architecture, a many-to-one architecture, on scalability. In addition, system usage information reported from GCP is leveraged to identify the bottleneck of a neural network training on distributed architecture.
Graduation Semester: 2019-05
Type of Resource: text
Permalink: http://hdl.handle.net/2142/105276

Comparison of distributed training architecture for convolutional neural network in cloud

Shi, Dongwei

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Comparison of distributed training architecture for convolutional neural network in cloud

Shi, Dongwei

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In