Withdraw
Loading…
Mitigating variability in HPC systems and applications for performance and power efficiency
Acun, Bilge
Loading…
Permalink
https://hdl.handle.net/2142/99502
Description
- Title
- Mitigating variability in HPC systems and applications for performance and power efficiency
- Author(s)
- Acun, Bilge
- Issue Date
- 2017-12-06
- Director of Research (if dissertation) or Advisor (if thesis)
- Kalé, Laxmikant V
- Doctoral Committee Chair(s)
- Kalé, Laxmikant V
- Committee Member(s)
- Abdelzaher, Tarek
- Torrellas, Josep
- Beckman, Pete
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Date of Ingest
- 2018-03-13T17:35:43Z
- Keyword(s)
- Power
- Energy
- Temperature
- Frequency
- High performance computing (HPC)
- Variability
- Data center
- Performance
- Supercomputer
- Energy consumption
- Power variation
- Frequency variation
- Temperature variation
- Energy efficient algorithms
- Cooling power
- Fan control
- Runtime systems
- Load balancing
- Dynamic runtimes
- Manufacturing variations
- Turbo-boost
- Dynamic voltage and frequency scaling (DVFS)
- Parallel computing
- Abstract
- Power consumption and process variability are two important, interconnected, challenges of future generation large-scale High Performance Computing (HPC) data centers. For example, current production petaflop supercomputers consume more than 10 megawatts of machine and cooling power that costs millions of dollars every year. As HPC moves towards exascale computing, these costs will increase and power consumption is expected to become a major concern. Not solely dynamic behavior of HPC applications but also dynamic behavior of HPC systems makes it challenging to optimize the performance and power efficiency of large scale applications. Dynamic behavior of applications include irregular or imbalanced applications. Dynamic behavior of HPC systems include thermal, power, and frequency variations among processors. Smart and adaptive runtime systems have great potential to handle these challenges transparently from the application. In this dissertation, I first analyze frequency, temperature, and power variations in large- scale HPC systems using thousands of cores and different applications. After I identify the cause of each of these variations, I propose solutions to mitigate these variations to improve performance and power efficiency. When analyzing frequency variation, I attribute manufacturing related intrinsic differences in the chips’ power efficiency as the culprit behind frequency variation under dynamic overclocking. I propose speed-aware dynamic load balancing strategies to mitigate the performance overhead due to frequency variation. When analyzing temperature variation, I focus on inefficiencies in fan-based air cooling systems. I propose proactive and decoupled fan control mechanisms that reduce temperature variations and reduce cooling power consumption by predicting core temperatures using a learning based model. When analyzing power variations, I identify manufacturing related sources of power variation that are static and dynamic. I propose different variation aware node assembly methods to mitigate the power variation. Finally, I propose a fine-grained runtime based technique to mitigate application level variations that are caused by the characteristics of the application itself (for example, applications with different kernel types or phases) in order to reduce the energy consumption.
- Graduation Semester
- 2017-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/99502
- Copyright and License Information
- Copyright 2017 Bilge Acun
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Siebel School of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…