Efficient AI hardware acceleration

Zhang, Xiaofan

Efficient AI hardware acceleration

Zhang, Xiaofan

Content Files

ZHANG-DISSERTATION-2022.pdf

Permalink

https://hdl.handle.net/2142/117735

Description

Title

Efficient AI hardware acceleration

Author(s)

Zhang, Xiaofan

Issue Date

2022-10-21

Director of Research (if dissertation) or Advisor (if thesis)

Chen, Deming
Hwu, Wen-Mei
Patel, Sanjay
Xiong, Jinjun

Doctoral Committee Chair(s)

Chen, Deming

Committee Member(s)

Huang, Jian

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

AI Systems
Deep Neural Networks
Hardware Acceleration
Energy-efficient Computing

Abstract

The great success of artificial intelligence (AI) has been driven in part by the continuous improvement of deep neural networks (DNNs) with deeper and more sophisticated model structures. DNNs thus become more compute- and memory-intensive. They cause significant challenges for hardware deployment, as they require not only high inference accuracy but satisfied inference speed, throughput, and energy efficiency. Challenges also come from limited hardware resources, restricted power budgets, tedious hardware design programming, intricate hardware verification problems, and time-consuming design space explorations. To address these challenges, this dissertation proposes a comprehensive toolset for efficient AI hardware acceleration targeting various edge and cloud scenarios. It covers the full stack of AI applications, from delivering hardware-efficient DNNs on the algorithm side to building domain-specific hardware accelerators for existing or customized hardware platforms. Major novelties include HLS-based accelerator design and optimization strategies, end-to-end automation tools, and DNN-accelerator co-design methods, which enable highly efficient AI hardware acceleration for various popular AI applications. Our proposed solution starts with the efficient DNN hardware accelerator design using a High-Level Synthesis (HLS) design flow. Customized hardware accelerators can be developed from a higher abstraction level with a fast response to support emerging AI applications. We demonstrate this method by implementing the first FPGA-based accelerator of the Long-term Recurrent Convolutional Network (LRCN) to enable real-time image captioning. Our design achieves 3.1X higher speedup and 17.5X higher efficiency compared to the optimized GPU-based solution. We then propose DNNBuilder to improve the efficiency of accelerator design and optimization. It is an end-to-end automation tool providing an integrated design flow from DNN design in deep learning frameworks to board-level FPGA implementations. Users are no longer required to design and optimize accelerators manually but can enjoy the auto-generated hardware accelerators for desired AI workloads. Novel designs include the fine-grained layer-based pipeline architecture and the column-based cache scheme, which achieve 7.7X and 43X reduction of latency and on-chip memory usage. We demonstrate DNNBuilder by generating state-of-the-art accelerators for various AI services with high quality and energy efficiency. In addition, we propose a series of efficient design methods to perform algorithm/accelerator co-design and co-optimization, which provides systematic strategies to integrate hardware and software designs. We propose SkyNet, a co-design strategy for hardware-efficient DNN design and deployment with a comprehensive awareness of the hardware constraints. Its effectiveness is demonstrated by outperforming 100+ competitors in the IEEE/ACM Design Automation Conference System Design Contest (DAC-SDC) for low-power real-time object detection. We then extend the co-design to handle two emerging and challenging AI tasks in real-life edge and cloud AI scenarios. We propose F-CAD to deliver customized accelerators for Virtual Reality (VR) applications running on extremely lightweight edge devices. Its generated designs can achieve up to 4.0X higher throughput and up to 62.5% higher energy efficiency than state-of-the-art designs. We propose AutoDistill to address the difficulties of serving large-scale Natural Language Processing (NLP) models in the cloud. Following the co-design strategy, it integrates effective model compression and neural architecture search technologies to deliver high-quality and hardware-efficient NLP pre-trained models. Evaluated on the latest TPU chip, the AutoDistill-generated NLP model can achieve 3.2% higher accuracy and 1.44X faster hardware performance than the state-of-the-art. These techniques contribute to a new comprehensive toolset that covers hardware accelerator design and optimization on different abstraction levels and DNN-accelerator co-design to deliver efficient AI acceleration for edge and cloud scenarios. It successfully bridges the gap between DNN designs and their hardware deployment and enables easy-accessible, high-quality, and sustainable AI acceleration. As a result, we are able to demonstrate state-of-the-art solutions for various popular and emerging AI applications.

Graduation Semester

2022-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/117735

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Efficient AI hardware acceleration

Zhang, Xiaofan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In