Withdraw
Loading…
Towards unified and transparent general-purpose hardware acceleration
Wang, Dong Kai
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/121340
Description
- Title
- Towards unified and transparent general-purpose hardware acceleration
- Author(s)
- Wang, Dong Kai
- Issue Date
- 2023-07-10
- Director of Research (if dissertation) or Advisor (if thesis)
- Kim, Nam Sung
- Doctoral Committee Chair(s)
- Kim, Nam Sung
- Committee Member(s)
- Kumar, Rakesh
- Torrellas, Josep
- Ghose, Saugata
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- hardware acceleration, CPU microarchitecture, dataflow processing
- Abstract
- Modern general-purpose processors suffer inherent inefficiencies caused by control overheads that arise from the von Neumann execution model. As Moore's Law continues to decline and the demand for efficient data processing continues to grow, researchers and engineers have developed various hardware accelerators that are less flexible but achieve significantly better efficiency compared to general-purpose processors. However, most of the accelerators deployed today are domain-specific accelerators (DSAs), which excel at accelerating specific application classes (e.g., machine learning) but are rarely adaptable to different types of workloads. Consequently, if we intend to accelerate a wide range of different applications, the costs associated with integrating numerous DSAs can quickly outweigh the benefits of improved execution efficiency. On the other hand, generalizable hardware acceleration methods that utilize configurable hardware, such as field-programmable gate arrays (FPGAs) or coarse-grained reconfigurable architectures (CGRAs), are not user-transparent. These methods often necessitate code modifications or the adoption of additional software components, such as high-level synthesis, domain-specific programming languages, and specialized compilers and libraries. Fundamentally, there is a lack of a unified and transparent solution to accelerate a wide range of existing applications without requiring code changes or program recompilation. This dissertation seeks to find a solution that offers the binary compatibility of general-purpose processors while providing the efficiency of hardware accelerators. We propose two novel microarchitectures, DiAG and MESA, which aim to tackle this challenge from two different angles. DiAG is a drastically reimagined central processing unit (CPU) microarchitecture that can minimize execution latency by exploiting instruction-level parallelism or maximize execution throughput by exploiting data-level parallelism. DiAG employs an accelerator-inspired design that implicitly constructs dataflow graphs of the program as it executes, without incurring expensive control overheads that hinder the efficiency of out-of-order processors. Despite its relatively larger hardware area, DiAG offers three main benefits over conventional CPUs: reduced frontend overheads, seamless instruction reuse, and loop-level pipelining. We design and implement a prototype DiAG processor that supports the RISC-V instruction set in SystemVerilog. In our evaluations, a DiAG processor equipped with 512 functional units achieves an 18% speedup in performance and 63% improvement in energy efficiency against a comparable out-of-order CPU baseline. DiAG's successor, MESA, takes an alternate approach: rather than radically transforming the CPU's microarchitecture, we leverage existing CPU structures to dynamically construct an accelerator architecture at runtime. MESA is a non-intrusive hardware controller that monitors programs executing on the CPU for acceleration opportunities. If the running application is considered suitable for acceleration, MESA translates its machine code to an accelerator configuration that can be programmed onto a spatial accelerator to allow immediate offloading. Experiments show that a MESA-equipped spatial accelerator achieves a 33% speedup in performance and an 86% gain in energy efficiency compared to a CPU baseline while its area overheads amount to less than 10% the area of a single core. Finally, combining lessons learned from DiAG and MESA, we propose a graph-based architecture model that can be used to generate custom processor microarchitectures. Future work will utilize this model to build an end-to-end architecture generation and optimization framework.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Dong Kai Wang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…