Withdraw
Loading…
Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands
Crago, Neal
Loading…
Permalink
https://hdl.handle.net/2142/34589
Description
- Title
- Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands
- Author(s)
- Crago, Neal
- Issue Date
- 2012-09-18T21:27:03Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Patel, Sanjay J.
- Doctoral Committee Chair(s)
- Patel, Sanjay J.
- Committee Member(s)
- Hwu, Wen-Mei W.
- Lumetta, Steven S.
- Chen, Deming
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Parallel Processing
- Data-parallel
- Graphics processing unit (GPU)
- General-purpose computing on graphics processing units (GPGPU)
- manycore
- latency tolerance
- decoupled architecture
- compiler technique
- energy-efficiency
- power-efficiency
- high-performance
- low power
- low energy
- Abstract
- This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel processors. The approach focuses on developing instruction latency tolerance to improve performance for a single thread. The main idea behind the approach is to leverage the compiler to split the original thread into separate memory-accessing and memory-consuming instruction streams. The goal is to provide latency tolerance similar to high-performance techniques such as out-of-order execution while leveraging low hardware complexity similar to an in-order execution core. The research in this dissertation supports the following thesis: Pipeline stalls due to long exposed instruction latency are the main performance limiter for cached 1000-core data parallel processors. Leveraging natural decoupling of memory-access and memory-consumption, a serial thread of execution can be partitioned into strands providing energy-efficient latency tolerance. This dissertation motivates the need for latency tolerance in 1000-core data parallel processors and presents decoupled core architectures as an alternative to currently used techniques. This dissertation discusses the limitations of prior decoupled architectures, and proposes techniques to improve both latency tolerance and energy-efficiency. Finally, the success of the proposed decoupled architecture is demonstrated against other approaches by performing an exhaustive design space exploration of energy, area, and performance using high-fidelity performance and physical design models.
- Graduation Semester
- 2012-08
- Permalink
- http://hdl.handle.net/2142/34589
- Copyright and License Information
- Copyright 2012 Neal Crago
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…