Utilizing GPU tensor cores for algorithmic acceleration
Durrani, Sultan Hayat Khan
Loading…
Permalink
https://hdl.handle.net/2142/108349
Description
Title
Utilizing GPU tensor cores for algorithmic acceleration
Author(s)
Durrani, Sultan Hayat Khan
Issue Date
2020-05-13
Director of Research (if dissertation) or Advisor (if thesis)
Hwu, Wen-Mei W
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Computer Architecture
GPU
Tensor Cores
CUDA
FFT
Abstract
There has been a surge in the demand for a Domain Specific Architecture due to wide ranging deep learning applications like Image classification, speech recognition, in healthcare, self-driving cars etc. Matrix Multiplication acceleration has been a popular design choice when creating these specialized units to boost deep learning training and inference. Nvidia's Volta architecture introduced Tensor Cores which promised a 3 times speedup over their Pascal architecture. Despite the favorable performance gains, these accelerators have not been applied extensively to a wider class of algorithms. Through this thesis we introduce novel ways of mapping various algorithms on the Tensor Cores. We implemented Tensor Core based reduction, power iteration and Fast Fourier Transform (FFT) and show that effectively utilizing GPU compute resources would result in substantial gains in performance. Our reduction gave a 1.5 times speedup against CUB API; power iteration gave on average 2 times the speedup against Thrust and cuBLAS based implementation while our FFT implementation was able to outperform cuFFT with up to 8 times the speedup.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.