Comparison of distributed training architecture for convolutional neural network in cloud
Shi, Dongwei
Loading…
Permalink
https://hdl.handle.net/2142/105276
Description
Title
Comparison of distributed training architecture for convolutional neural network in cloud
Author(s)
Shi, Dongwei
Issue Date
2019-04-26
Director of Research (if dissertation) or Advisor (if thesis)
Hwu, Wen-Mei
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Deep Learning, Distributed System
Abstract
The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have enabled breakthroughs in various artificial intelligence fields such as computer vision, natural language processing and data mining. The training process of the DNN is a computationally intensive application that can be accelerated by parallel computing devices such as graphic processing units (GPUs) and field programmable gate arrays (FP- GAs). However, sometimes the amount of the training data or the size of the model might exceed what can be efficiently trained or loaded by a single machine. Distributed deep learning training addresses this issue by spreading the computations over several machines. Due to the internode communication and other overheads in distributed computing infrastructure, the performance improvements are not directly proportional to the number of machines. This thesis will study the computation time, memory, bandwidth, and other resources that are required to perform distributed deep learning. The approach of this work is to implement and deploy several data parallelism distributed deep learning algorithms on Google Cloud Platform (GCP) and then to analyze the performance and compare the communication over- head between different algorithms. The results obtained in this research yield the Ring All-Reduce architecture, a bandwidth-optimal communication operation used for distributed deep learning, which outperforms the Parameter Server architecture, a many-to-one architecture, on scalability. In addition, system usage information reported from GCP is leveraged to identify the bottleneck of a neural network training on distributed architecture.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.