Towards unified and transparent general-purpose hardware acceleration

Wang, Dong Kai

Towards unified and transparent general-purpose hardware acceleration

Wang, Dong Kai

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/121340

Description

Title

Towards unified and transparent general-purpose hardware acceleration

Author(s)

Wang, Dong Kai

Issue Date

2023-07-10

Director of Research (if dissertation) or Advisor (if thesis)

Kim, Nam Sung

Doctoral Committee Chair(s)

Kim, Nam Sung

Committee Member(s)

Kumar, Rakesh
Torrellas, Josep
Ghose, Saugata

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

hardware acceleration, CPU microarchitecture, dataflow processing

Abstract

Modern general-purpose processors suffer inherent inefficiencies caused by control overheads that arise from the von Neumann execution model. As Moore's Law continues to decline and the demand for efficient data processing continues to grow, researchers and engineers have developed various hardware accelerators that are less flexible but achieve significantly better efficiency compared to general-purpose processors. However, most of the accelerators deployed today are domain-specific accelerators (DSAs), which excel at accelerating specific application classes (e.g., machine learning) but are rarely adaptable to different types of workloads. Consequently, if we intend to accelerate a wide range of different applications, the costs associated with integrating numerous DSAs can quickly outweigh the benefits of improved execution efficiency. On the other hand, generalizable hardware acceleration methods that utilize configurable hardware, such as field-programmable gate arrays (FPGAs) or coarse-grained reconfigurable architectures (CGRAs), are not user-transparent. These methods often necessitate code modifications or the adoption of additional software components, such as high-level synthesis, domain-specific programming languages, and specialized compilers and libraries. Fundamentally, there is a lack of a unified and transparent solution to accelerate a wide range of existing applications without requiring code changes or program recompilation. This dissertation seeks to find a solution that offers the binary compatibility of general-purpose processors while providing the efficiency of hardware accelerators. We propose two novel microarchitectures, DiAG and MESA, which aim to tackle this challenge from two different angles. DiAG is a drastically reimagined central processing unit (CPU) microarchitecture that can minimize execution latency by exploiting instruction-level parallelism or maximize execution throughput by exploiting data-level parallelism. DiAG employs an accelerator-inspired design that implicitly constructs dataflow graphs of the program as it executes, without incurring expensive control overheads that hinder the efficiency of out-of-order processors. Despite its relatively larger hardware area, DiAG offers three main benefits over conventional CPUs: reduced frontend overheads, seamless instruction reuse, and loop-level pipelining. We design and implement a prototype DiAG processor that supports the RISC-V instruction set in SystemVerilog. In our evaluations, a DiAG processor equipped with 512 functional units achieves an 18% speedup in performance and 63% improvement in energy efficiency against a comparable out-of-order CPU baseline. DiAG's successor, MESA, takes an alternate approach: rather than radically transforming the CPU's microarchitecture, we leverage existing CPU structures to dynamically construct an accelerator architecture at runtime. MESA is a non-intrusive hardware controller that monitors programs executing on the CPU for acceleration opportunities. If the running application is considered suitable for acceleration, MESA translates its machine code to an accelerator configuration that can be programmed onto a spatial accelerator to allow immediate offloading. Experiments show that a MESA-equipped spatial accelerator achieves a 33% speedup in performance and an 86% gain in energy efficiency compared to a CPU baseline while its area overheads amount to less than 10% the area of a single core. Finally, combining lessons learned from DiAG and MESA, we propose a graph-based architecture model that can be used to generate custom processor microarchitectures. Future work will utilize this model to build an end-to-end architecture generation and optimization framework.

Graduation Semester

2023-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Towards unified and transparent general-purpose hardware acceleration

Wang, Dong Kai

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In