Withdraw
Loading…
Automated and scalable system solution for application acceleration
Zuo, Wei
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/122164
Description
- Title
- Automated and scalable system solution for application acceleration
- Author(s)
- Zuo, Wei
- Issue Date
- 2023-12-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Chen, Deming
- Doctoral Committee Chair(s)
- Chen, Deming
- Committee Member(s)
- Hwu, Wen-Mei
- Huang, Jian
- Hasegawa-Johnson, Mark
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Hardware acceleration, automated system solution, Hardware/Software Co-design
- Abstract
- Computation has been increasingly playing an important role in society, in fields such as entertainment, automotive, industrial, and healthcare applications, just to name a few. The advent of Deep Learning and Artificial Intelligence promises to keep this trend going well into the future. However, the increasing computational demand also creates a big challenge while working with limited computational resources. One general desire is for computational workloads to execute with low latency, and at reasonable energy or power budgets. One important method for achieving efficiency and effectiveness of application implementations is through the exploration and exploitation of the rich variety of computational platforms available today, from traditional CPUs to custom-designed logic on Field Programmable Gate Arrays (FPGAs). However, customizing each application to work in the most efficient manner across multiple, highly flexible computational components is a very challenging task, involving many difficult problems. These include how to estimate the downstream impact of early design choices in application implementation, how to choose the right implementation from a very large number of functionally equivalent implementations of the application on the underlying platform, and how to handle control-flow divergence which increases both the complexity of the design space and the effective search for the best implementation. In addition, expending additional human resources on this complex task may not meet the demands of rapid application development cycles in today’s industry. Hence, an automated solution is desired. Addressing this critical need, this dissertation solves the problem of automatically converting application source code to an accelerated, efficient implementation on a given computational hardware, through mathematical modeling and algorithmic search for the best implementation. To solve this problem in a manner that addresses a large class of applications, this dissertation addresses major challenges in automated design, through four key components: (1) accurate hardware performance/power/area modeling, given a software component to be mapped to hardware, without resorting to detailed implementation of each hardware design point, (2) efficient and accurate software performance/power modeling, given a software component to be executed on a target processor, without full simulation of the software execution on the processor, (3) a system-level modeling and design space exploration tool, which for a given application and the target computational platform (such as an SoC: System-on-Chip), constructs the entire design space, and efficiently traverses the large design space to find near-optimal design points for best application latency, within area and power budget constraints, and (4) adopting and extending the above mentioned techniques to accelerate a wide class of applications in deep learning and natural language processing, called Mixture-of-Experts, where opaque model decisions lead to control-flow divergence. At a low level, the methods presented in the dissertation make it very inexpensive to model the trade-offs among different design choices, whereas traditional approaches would require expensive steps either through detailed simulation or low-level hardware implementation. At a higher level our methods leverage these efficient modeling approaches to formulate design-space exploration as a mathematical problem, which is unfortunately NP-hard. Simplifications to this problem formulation are then introduced, resulting in rapid solutions produced within a few minutes or hours, at the same time delivering high quality implementations. Both aspects are critical in a world filled with computations; ensuring low latency and highly efficient deployments make applications more useful and feasible, and delivering such implementations automatically and rapidly without significant human involvement allows such design solutions to be performed at scale. Furthermore, trends in Deep Learning point to an increase of such applications that are latency critical while being executed under resource constraints. Experiments on computer vision and natural language processing models indicate that methods developed here are applicable to this critical and growing field, as well. It is hoped that the methods presented in this dissertation provide automated and scalable system solutions for application acceleration and encourage further research in these rapidly evolving areas.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Wei Zuo
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…