Using hybrid shared and distributed caching for mixed-coherency GPU workloads

Anssari, Nasser

Using hybrid shared and distributed caching for mixed-coherency GPU workloads

Anssari, Nasser

Content Files

Nasser_Anssari.pdf

Permalink

https://hdl.handle.net/2142/42361

Description

Title

Using hybrid shared and distributed caching for mixed-coherency GPU workloads

Author(s)

Anssari, Nasser

Issue Date

2013-02-03T19:36:23Z

Director of Research (if dissertation) or Advisor (if thesis)

Hwu, Wen-Mei W.

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2013-02-03T19:36:23Z

Keyword(s)

Graphics Processing Unit (GPU) Computing
Cache Coherence
Memory Consistency
Sharing Tracker
High Performance Computing (HPC) Workloads

Abstract

Current GPU computing models support a mixture of coherent and incoherent classes of memory operations. Workloads using these models typically have working sets too large to fit in an economical SRAM structure. Still, GPU architectures have last-level caches to primarily fulfill two functions: eliminate redundant DRAM accesses servicing requests from different L1 caches to the same line, and maintain on-chip memory coherence for the coherent class of memory operations. In this thesis, we propose an alternative memory system design for GPU architectures better fit for their workloads. Our architectural design features a directory-like sharing tracker that allows the incoherent private L1 caches to directly satisfy remote requests for shared data. It also retains a shared L2 cache with a customized caching policy to support coherent accesses on-chip and better serve non-coalesced requests that contend aggressively for cache lines. This thesis characterizes the novel and intriguing tradeoffs between the components of our proposed memory system design for area, energy, and performance. We show that the proposed design achieves a 22% average reduction in DRAM data demand over a standard GPU architecture with 1MB L2 cache, leading to an overall 28% reduction in the memory system energy consumption on average. Conversely, our results show that the DRAM data demand of the proposed design with 256KB L2 cache is on par with a standard GPU architecture with 1MB L2 cache, albeit at a smaller area overhead and power leakage. Our results, while drawn on motivations from the GPU realm, are not architecture-specific and can be extended to other throughput-oriented many-core organizations.

Graduation Semester

2012-12

Permalink

http://hdl.handle.net/2142/42361

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Using hybrid shared and distributed caching for mixed-coherency GPU workloads

Anssari, Nasser

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In