Withdraw
Loading…
Efficient AI hardware acceleration
Zhang, Xiaofan
Loading…
Permalink
https://hdl.handle.net/2142/117735
Description
- Title
- Efficient AI hardware acceleration
- Author(s)
- Zhang, Xiaofan
- Issue Date
- 2022-10-21
- Director of Research (if dissertation) or Advisor (if thesis)
- Chen, Deming
- Hwu, Wen-Mei
- Patel, Sanjay
- Xiong, Jinjun
- Doctoral Committee Chair(s)
- Chen, Deming
- Committee Member(s)
- Huang, Jian
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- AI Systems
- Deep Neural Networks
- Hardware Acceleration
- Energy-efficient Computing
- Abstract
- The great success of artificial intelligence (AI) has been driven in part by the continuous improvement of deep neural networks (DNNs) with deeper and more sophisticated model structures. DNNs thus become more compute- and memory-intensive. They cause significant challenges for hardware deployment, as they require not only high inference accuracy but satisfied inference speed, throughput, and energy efficiency. Challenges also come from limited hardware resources, restricted power budgets, tedious hardware design programming, intricate hardware verification problems, and time-consuming design space explorations. To address these challenges, this dissertation proposes a comprehensive toolset for efficient AI hardware acceleration targeting various edge and cloud scenarios. It covers the full stack of AI applications, from delivering hardware-efficient DNNs on the algorithm side to building domain-specific hardware accelerators for existing or customized hardware platforms. Major novelties include HLS-based accelerator design and optimization strategies, end-to-end automation tools, and DNN-accelerator co-design methods, which enable highly efficient AI hardware acceleration for various popular AI applications. Our proposed solution starts with the efficient DNN hardware accelerator design using a High-Level Synthesis (HLS) design flow. Customized hardware accelerators can be developed from a higher abstraction level with a fast response to support emerging AI applications. We demonstrate this method by implementing the first FPGA-based accelerator of the Long-term Recurrent Convolutional Network (LRCN) to enable real-time image captioning. Our design achieves 3.1X higher speedup and 17.5X higher efficiency compared to the optimized GPU-based solution. We then propose DNNBuilder to improve the efficiency of accelerator design and optimization. It is an end-to-end automation tool providing an integrated design flow from DNN design in deep learning frameworks to board-level FPGA implementations. Users are no longer required to design and optimize accelerators manually but can enjoy the auto-generated hardware accelerators for desired AI workloads. Novel designs include the fine-grained layer-based pipeline architecture and the column-based cache scheme, which achieve 7.7X and 43X reduction of latency and on-chip memory usage. We demonstrate DNNBuilder by generating state-of-the-art accelerators for various AI services with high quality and energy efficiency. In addition, we propose a series of efficient design methods to perform algorithm/accelerator co-design and co-optimization, which provides systematic strategies to integrate hardware and software designs. We propose SkyNet, a co-design strategy for hardware-efficient DNN design and deployment with a comprehensive awareness of the hardware constraints. Its effectiveness is demonstrated by outperforming 100+ competitors in the IEEE/ACM Design Automation Conference System Design Contest (DAC-SDC) for low-power real-time object detection. We then extend the co-design to handle two emerging and challenging AI tasks in real-life edge and cloud AI scenarios. We propose F-CAD to deliver customized accelerators for Virtual Reality (VR) applications running on extremely lightweight edge devices. Its generated designs can achieve up to 4.0X higher throughput and up to 62.5% higher energy efficiency than state-of-the-art designs. We propose AutoDistill to address the difficulties of serving large-scale Natural Language Processing (NLP) models in the cloud. Following the co-design strategy, it integrates effective model compression and neural architecture search technologies to deliver high-quality and hardware-efficient NLP pre-trained models. Evaluated on the latest TPU chip, the AutoDistill-generated NLP model can achieve 3.2% higher accuracy and 1.44X faster hardware performance than the state-of-the-art. These techniques contribute to a new comprehensive toolset that covers hardware accelerator design and optimization on different abstraction levels and DNN-accelerator co-design to deliver efficient AI acceleration for edge and cloud scenarios. It successfully bridges the gap between DNN designs and their hardware deployment and enables easy-accessible, high-quality, and sustainable AI acceleration. As a result, we are able to demonstrate state-of-the-art solutions for various popular and emerging AI applications.
- Graduation Semester
- 2022-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Xiaofan Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…