TwinDNN: A tale of two deep neural networks

Jeong, Hyunmin

TwinDNN: A tale of two deep neural networks

Jeong, Hyunmin

Content Files

JEONG-THESIS-2021.pdf

Permalink

https://hdl.handle.net/2142/110581

Description

Title

TwinDNN: A tale of two deep neural networks

Author(s)

Jeong, Hyunmin

Issue Date

2021-04-27

Director of Research (if dissertation) or Advisor (if thesis)

Chen, Deming

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2021-09-17T01:13:29Z

Keyword(s)

Hardware Accelerator
High-Level-Synthesis
Machine Learning
Neural Network Quantization

Abstract

Compression technologies for deep neural networks (DNNs), such as weight quantization, have been widely investigated to reduce the model size so that they can be implemented on hardware with strict resource restrictions. However, one major disadvantage of model compression is accuracy degradation. To deal with this problem effectively, we propose a new compressed network inference scheme with a high accuracy but slower DNN coupled with its highly compressed DNN version that typically delivers much faster inference speed but with a lower accuracy. During the inference, we determine the confidence of the prediction of the compressed DNN, and infer the original neural network for the inputs that are considered not confident by the compressed DNN. The proposed design uses a balanced number of resources available on the hardware and can deliver overall accuracy close to the high accuracy model, but with the inference speed closer to the compressed DNN. We demonstrate our design on two image classification tasks: CIFAR-10 and ImageNet. Our experiments show that our design can recover up to 94% of accuracy drop caused by extreme network compression, with more than 90% increase in throughput compared to just using the original DNN. This is more than 17% extra accuracy recovery and 36% extra speedup compared to the previous work with a similar concept on VGG-16. This is the first work that considers using a highly compressed DNN along with the original DNN in parallel to achieve high accuracy and speed at the same time, while maintaining the resource balance by using two different main computation sources on the field programmable gate array (FPGA).

Graduation Semester

2021-05

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/110581

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

TwinDNN: A tale of two deep neural networks

Jeong, Hyunmin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In