Accelerating Sparse CNNs on FPGAs

Permalink

https://hdl.handle.net/2142/117046

Title: Accelerating Sparse CNNs on FPGAs
Author(s): Rejive, Joseph
Issue Date: 2022-12-16
Keyword(s): FPGA; HLS; CNN; Deep Learning Accelerator
Abstract: Convolutional Neural Networks (CNNs) are a class of neural networks which perform exceptionally well on computer vision tasks such as image classification and segmentation. Many vision based tasks benefit greatly when able to run in real time. For example, Autopilot needs to make decisions in milliseconds based on what it sees in the camera. Therefore, it’s important that there is hardware in place that can perform computations quickly and in parallel. By performing pruning and quantization, we can reduce the model size while preserving similar accuracy. This can result in great speedups during inference if the hardware supports sparsity. In this thesis, I present a Field-Programmable Gate Array (FPGA) based accelerator that can support sparsity in convolutional neural networks. It features a dataflow approach to continuously stream data from off-chip memory and overlap execution with memory operations to ensure the compute units have high utilization. To implement the design, I used High Level Synthesis (HLS) tools to speed up the development time. Results show a dataflow architecture significantly outperforms other FPGA architectures by an order of magnitude. Furthermore, a dataflow approach is nearly 12x faster than a reference software convolution implementation.
Type of Resource: text

The best of ECE undergraduate research