Withdraw
Loading…
Improving Per-Thread Performance on CMPs through Timing Speculation
Greskamp, Brian
Loading…
Permalink
https://hdl.handle.net/2142/13148
Description
- Title
- Improving Per-Thread Performance on CMPs through Timing Speculation
- Author(s)
- Greskamp, Brian
- Issue Date
- 2009-07-23
- Doctoral Committee Chair(s)
- Torrellas, Josep
- Committee Member(s)
- Borkar, Shekhar
- Chen, Deming
- Patel, Sanjay J.
- Zilles, Craig
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- chip multiprocessor
- microarchitecture
- timing speculation
- Language
- en
- Abstract
- The future of performance scaling lies in massively parallel workloads, but less-parallel applications will remain important. Unfortunately, future process technologies and core microarchitectures no longer promise major per-thread performance improvements, so microarchitects must find new ways to address a growing per-thread performance deficit. Moreover, they must do so without sacrificing parallel throughput. To meet these apparently conflicting demands, this dissertation proposes a Timing Speculation (TS) system for CMPs that boosts core clock frequencies past their normal limits when an application demands per-thread performance and operates efficiently at nominal frequency when it demands throughput. This work's contributions are organized into three interlocking proposals. This work begins by introducing Paceline, the first TS microarchitecture designed specifically for CMPs. Paceline enables two cores to work together to execute a single thread at high speed under TS or independently to execute two threads at the rated frequency. In single-thread mode, one core in the pair --- the ``Leader'' --- executes at higher-than-normal frequency, while a ``Checker'' runs at the rated, safe frequency. The Leader runs the program faster but may experience timing errors. To detect and correct these errors, the Checker periodically compares a hash of its architectural state with that of the Leader. The Leader helps the Checker keep up by passing it branch results and prefetches. Next, this dissertation enhances Paceline with BlueShift, a circuit design method for TS architectures that improves a circuit's common-case delay rather than focusing on worst-case delay like traditional design flows. BlueShift profiles a gate-level design as it runs real benchmark applications to identify the frequently-exercised circuit paths and then applies speed optimizations to those paths only. These optimizations can be implemented in a way that can be enabled and disabled at run-time so that they do not exact a power cost when they are not needed (ie. when the processor is executing a throughput workload). Finally, this work presents LeadOut, a CMP design that combines Paceline with an additional per-thread performance enhancement: the ability to increase core supply voltage above nominal. LeadOut evaluates the performance gains that are possible with Paceline alone, voltage boosting alone, and both together. It shows major gains from applying the two techniques together when feasible and also shows that, in many cases, future CMPs have power and temperature headroom to exploit still more per-thread enhancements as long as they can be enabled and disabled dynamically according to application demand.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/13148
Owning Collections
Dissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…