Towards automatic characterization of computation and communication in CUDA programs

Grande, Dominic

Towards automatic characterization of computation and communication in CUDA programs

Grande, Dominic

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/99991

Description

Title

Towards automatic characterization of computation and communication in CUDA programs

Author(s)

Grande, Dominic

Contributor(s)

Hwu, Wen-Mei

Issue Date

2018-05

Keyword(s)

CUDA programs
computation and communication in CUDA programs
CUDA tools and machine learning applications

Abstract

As more complex heterogeneous applications become more common, it has become increasingly difficult to profile and characterize their performance on a variety of machines. To combat this additional complexity, this work proposes a series of tools that can be used to trace and profile an application's behavior with the underlying system. So far these tools are limited to profiling CUDA applications only, due to the mature nature of Nvidia's CUDA API, as well as the mature CUDA application ecosystem. However, in the future these tools can be expanded to include any heterogeneous system. In this thesis the focus has been primarily on machine learning applications, such as MxNet or Caffe2, due to their heavy reliance on GPGPUs and CUDA for computation. This work is focused on the new tools heteroprof and heteroprof-rs, but will briefly mention other tools that are also being developed by others in parallel. These other tools focus primarily on the system characterization itself, instead of software interaction with the system. The eventual goal is for these tools to be used in tandem to automate either the design of the system itself, automate the design of the application, or to improve scheduling systems on a given system. The tools heteroprof and heteroprof-rs were able to generate a dependence graph for execution of both MxNet and Caffe2. Heteroprof-rs also produced a breakdown of program execution, showing that for MxNet and Caffe2 there remains room for optimizations that will increase the overlap of computation and memory transfers on GPUs during execution.

Type of Resource

text

Language

Permalink

http://hdl.handle.net/2142/99991

Sponsor(s)/Grant Number(s)

IBM-ILLINOIS Center for Cognition Computing Systems Research (C3SR)

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

The best of ECE undergraduate research

Towards automatic characterization of computation and communication in CUDA programs

Grande, Dominic

Permalink

Description

Owning Collections

Senior Theses - Electrical and Computer Engineering PRIMARY

Log In