Towards automatic characterization of computation and communication in CUDA programs
Grande, Dominic
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/99991
Description
Title
Towards automatic characterization of computation and communication in CUDA programs
Author(s)
Grande, Dominic
Contributor(s)
Hwu, Wen-Mei
Issue Date
2018-05
Keyword(s)
CUDA programs
computation and communication in CUDA programs
CUDA tools and machine learning applications
Abstract
As more complex heterogeneous applications become more common, it has
become increasingly difficult to profile and characterize their performance
on a variety of machines. To combat this additional complexity, this work
proposes a series of tools that can be used to trace and profile an application's behavior with the underlying system. So far these tools are limited
to profiling CUDA applications only, due to the mature nature of Nvidia's
CUDA API, as well as the mature CUDA application ecosystem.
However, in the future these tools can be expanded to include any heterogeneous system. In this thesis the focus has been primarily on machine
learning applications, such as MxNet or Caffe2, due to their heavy reliance
on GPGPUs and CUDA for computation. This work is focused on the new
tools heteroprof and heteroprof-rs, but will briefly mention other tools that
are also being developed by others in parallel. These other tools focus primarily on the system characterization itself, instead of software interaction
with the system. The eventual goal is for these tools to be used in tandem to
automate either the design of the system itself, automate the design of the
application, or to improve scheduling systems on a given system.
The tools heteroprof and heteroprof-rs were able to generate a dependence
graph for execution of both MxNet and Caffe2. Heteroprof-rs also produced
a breakdown of program execution, showing that for MxNet and Caffe2 there
remains room for optimizations that will increase the overlap of computation
and memory transfers on GPUs during execution.
Type of Resource
text
Language
en
Permalink
http://hdl.handle.net/2142/99991
Sponsor(s)/Grant Number(s)
IBM-ILLINOIS Center for Cognition Computing Systems Research (C3SR)
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.