Withdraw
Loading…
Accelerating deep learning in the post-Moore’s era of computing
Hegde, Kartik
Loading…
Permalink
https://hdl.handle.net/2142/120125
Description
- Title
- Accelerating deep learning in the post-Moore’s era of computing
- Author(s)
- Hegde, Kartik
- Issue Date
- 2023-04-28
- Director of Research (if dissertation) or Advisor (if thesis)
- Fletcher, Christopher W
- Doctoral Committee Chair(s)
- Fletcher, Christopher W
- Committee Member(s)
- Adve, Sarita
- Hwu, Wen-mei
- Pellauer, Michael
- Torellas, Josep
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Deep Learning
- Computer Architecture
- Moore's Law
- Specialized Accelerators
- Sparsity
- Deep Neural Networks
- Programmable Accelerators
- Abstract
- Rapid growth in the availability of cheap and fast computing power—made possible by Moore’s scaling—has been a major tailwind behind the rise of modern deep learning. However, over the last decade, the performance deficit between compute requirements of deep learning workloads and the peak compute capability offered by modern processors has been rapidly increasing due to two opposing trends. On the one hand, improving the accuracy of deep learning models requires larger models and datasets, which in turn requires more compute resources. On the other hand, Moore’s scaling has been slowing down, leading to a slowdown in the rate of increase in peak compute throughput. While specialization has been seen as a way to address the increasing performance deficit, there is mounting evidence that specialization has only afforded us a “one-time” boost in performance. To make matters worse, the increasing gap between logic and memory, often called the memory wall, makes it harder to utilize all the compute on-chip, let alone add more. Therefore, the key question facing deep learning architects is, how do we continue to accelerate deep learning workloads in the post-Moore’s era of computing? This thesis presents a set of techniques to accelerate deep learning workloads without scaling the number of transistors on-chip. At a high level, we classify them into two thrusts. First, techniques that reduce the total compute required for deep learning workloads without sacrificing accuracy and utilize that to improve performance. Second, techniques that increase the compute-per-transistor by improving the resource utilization of deep learning accelerators. Notably, we show that the proposed techniques are not “zero-sum”, i.e., the gained performance comes at negligible area/power overheads. We begin by describing a baseline deep learning accelerator (DLA) tailored for deep learning workloads and represents state-of-the-art deep learning accelerators. We then describe concrete ways to evolve the baseline architecture to incorporate our proposed techniques. Finally, we provide a detailed analysis of the area overheads of the proposed techniques and demonstrate that they are minimal while the performance gains are significant. Overall, this thesis throws light on different directions that computer architects can take to continue to accelerate deep learning workloads without scaling the number of transistors on-chip. We believe that expanding the scope of specialization beyond hardware to include other layers of the deep learning stack and careful co-design will enable us to continue to accelerate deep learning workloads in the post-Moore’s era of computing.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Kartik Hegde
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…