Toward predictable execution of real-time workloads on modern GPUs

Singh, Jayati

Toward predictable execution of real-time workloads on modern GPUs

Singh, Jayati

Permalink

https://hdl.handle.net/2142/110571

Description

Title

Toward predictable execution of real-time workloads on modern GPUs

Author(s)

Singh, Jayati

Issue Date

2021-04-27

Director of Research (if dissertation) or Advisor (if thesis)

Caccamo, Marco

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2021-09-17T01:11:18Z

Keyword(s)

GPU
real-time
spatial partitioning
warp
scheduling
predictable execution

Abstract

Over the last decade, real-time systems have witnessed a major increase in computational demands, which cannot be met by existing multi-core processors. Graphics processing units (GPUs) are a cost-effective solution to serve such systems. The high throughput and energy efficiency offered by GPUs has led to their widespread adoption. Most real-time systems today have multiple tasks utilizing the GPU, and GPUs are getting bigger (more processing units) with every generation. Hence, prior solutions that give each task exclusive access to the GPU are no longer feasible from a real-time as well as cost perspective. This necessitates predictable GPU multi-tasking, which unfortunately cannot be trivially achieved in modern GPUs. New spatial and temporal scheduling policies need to be explored and enforced in modern GPUs to enable predictable execution of GPU tasks. Therefore, this thesis investigates two approaches to achieve predictable execution on NVIDIA GPUs. The first approach involves executing different tasks on disjoint sets of GPU processing units, that is, spatial partitioning (SP). There has been considerable effort by the industry and research community to enable GPU SP. However, leveraging SP to improve schedulability still needs to be investigated thoroughly. Therefore, we propose heuristics to partition the GPU into sets of processing units and assign tasks to each partition, with a goal of increased utilization while respecting the tasks' timing constraints. The second approach to enforce multi-tasking on GPUs is simultaneous multi-kernel (SMK). SMK arbitrates between tasks at the lowest level of execution, namely, at the warp level. We propose a real-time priority aware warp scheduler and study its performance when compared against kernel agnostic policies like loose-round-robin and greedy-then-oldest, which are implemented in NVIDIA hardware today. We implement and evaluate our proposed warp scheduling policy on GPGPU-Sim.

Graduation Semester

2021-05

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/110571

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Toward predictable execution of real-time workloads on modern GPUs

Singh, Jayati

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In