Accelerating deep learning in the post-Moore’s era of computing

Hegde, Kartik

Accelerating deep learning in the post-Moore’s era of computing

Hegde, Kartik

Permalink

https://hdl.handle.net/2142/120125

Description

Title

Accelerating deep learning in the post-Moore’s era of computing

Author(s)

Hegde, Kartik

Issue Date

2023-04-28

Director of Research (if dissertation) or Advisor (if thesis)

Fletcher, Christopher W

Doctoral Committee Chair(s)

Fletcher, Christopher W

Committee Member(s)

Adve, Sarita
Hwu, Wen-mei
Pellauer, Michael
Torellas, Josep

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Deep Learning
Computer Architecture
Moore's Law
Specialized Accelerators
Sparsity
Deep Neural Networks
Programmable Accelerators

Abstract

Rapid growth in the availability of cheap and fast computing power—made possible by Moore’s scaling—has been a major tailwind behind the rise of modern deep learning. However, over the last decade, the performance deficit between compute requirements of deep learning workloads and the peak compute capability offered by modern processors has been rapidly increasing due to two opposing trends. On the one hand, improving the accuracy of deep learning models requires larger models and datasets, which in turn requires more compute resources. On the other hand, Moore’s scaling has been slowing down, leading to a slowdown in the rate of increase in peak compute throughput. While specialization has been seen as a way to address the increasing performance deficit, there is mounting evidence that specialization has only afforded us a “one-time” boost in performance. To make matters worse, the increasing gap between logic and memory, often called the memory wall, makes it harder to utilize all the compute on-chip, let alone add more. Therefore, the key question facing deep learning architects is, how do we continue to accelerate deep learning workloads in the post-Moore’s era of computing? This thesis presents a set of techniques to accelerate deep learning workloads without scaling the number of transistors on-chip. At a high level, we classify them into two thrusts. First, techniques that reduce the total compute required for deep learning workloads without sacrificing accuracy and utilize that to improve performance. Second, techniques that increase the compute-per-transistor by improving the resource utilization of deep learning accelerators. Notably, we show that the proposed techniques are not “zero-sum”, i.e., the gained performance comes at negligible area/power overheads. We begin by describing a baseline deep learning accelerator (DLA) tailored for deep learning workloads and represents state-of-the-art deep learning accelerators. We then describe concrete ways to evolve the baseline architecture to incorporate our proposed techniques. Finally, we provide a detailed analysis of the area overheads of the proposed techniques and demonstrate that they are minimal while the performance gains are significant. Overall, this thesis throws light on different directions that computer architects can take to continue to accelerate deep learning workloads without scaling the number of transistors on-chip. We believe that expanding the scope of specialization beyond hardware to include other layers of the deep learning stack and careful co-design will enable us to continue to accelerate deep learning workloads in the post-Moore’s era of computing.

Graduation Semester

2023-05

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Accelerating deep learning in the post-Moore’s era of computing

Hegde, Kartik

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In