Improving Per-Thread Performance on CMPs through Timing Speculation

Greskamp, Brian

Improving Per-Thread Performance on CMPs through Timing Speculation

Greskamp, Brian

Permalink

https://hdl.handle.net/2142/13148

Description

Title

Improving Per-Thread Performance on CMPs through Timing Speculation

Author(s)

Greskamp, Brian

Issue Date

2009-07-23

Doctoral Committee Chair(s)

Torrellas, Josep

Committee Member(s)

Borkar, Shekhar
Chen, Deming
Patel, Sanjay J.
Zilles, Craig

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

chip multiprocessor
microarchitecture
timing speculation

Language

Abstract

The future of performance scaling lies in massively parallel workloads, but less-parallel applications will remain important. Unfortunately, future process technologies and core microarchitectures no longer promise major per-thread performance improvements, so microarchitects must find new ways to address a growing per-thread performance deficit. Moreover, they must do so without sacrificing parallel throughput. To meet these apparently conflicting demands, this dissertation proposes a Timing Speculation (TS) system for CMPs that boosts core clock frequencies past their normal limits when an application demands per-thread performance and operates efficiently at nominal frequency when it demands throughput. This work's contributions are organized into three interlocking proposals. This work begins by introducing Paceline, the first TS microarchitecture designed specifically for CMPs. Paceline enables two cores to work together to execute a single thread at high speed under TS or independently to execute two threads at the rated frequency. In single-thread mode, one core in the pair --- the ``Leader'' --- executes at higher-than-normal frequency, while a ``Checker'' runs at the rated, safe frequency. The Leader runs the program faster but may experience timing errors. To detect and correct these errors, the Checker periodically compares a hash of its architectural state with that of the Leader. The Leader helps the Checker keep up by passing it branch results and prefetches. Next, this dissertation enhances Paceline with BlueShift, a circuit design method for TS architectures that improves a circuit's common-case delay rather than focusing on worst-case delay like traditional design flows. BlueShift profiles a gate-level design as it runs real benchmark applications to identify the frequently-exercised circuit paths and then applies speed optimizations to those paths only. These optimizations can be implemented in a way that can be enabled and disabled at run-time so that they do not exact a power cost when they are not needed (ie. when the processor is executing a throughput workload). Finally, this work presents LeadOut, a CMP design that combines Paceline with an additional per-thread performance enhancement: the ability to increase core supply voltage above nominal. LeadOut evaluates the performance gains that are possible with Paceline alone, voltage boosting alone, and both together. It shows major gains from applying the two techniques together when feasible and also shows that, in many cases, future CMPs have power and temperature headroom to exploit still more per-thread enhancements as long as they can be enabled and disabled dynamically according to application demand.

Type of Resource

text

Permalink

http://hdl.handle.net/2142/13148

Owning Collections

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Improving Per-Thread Performance on CMPs through Timing Speculation

Greskamp, Brian

Permalink

Description

Owning Collections

Dissertations and Theses - Computer Science

Graduate Dissertations and Theses at Illinois PRIMARY

Log In