Withdraw
Loading…
Adaptive resource allocation for datacenter power capping
Zhang, Linghao
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/124600
Description
- Title
- Adaptive resource allocation for datacenter power capping
- Author(s)
- Zhang, Linghao
- Issue Date
- 2024-05-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Kalbarczyk, Zbigniew T.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Machine Learning
- Data center
- Power
- Abstract
- In contemporary cloud datacenters, the imperative for enhancing energy efficiency is paramount. Datacenter administrators have deployed a range of power management strategies, including oversubscription, power capping, and dynamic voltage and frequency scaling, to optimize power consumption at various management units (e.g., node-level or rack-level) within pre-scribed power budgets. Moreover, the capability to modulate energy consumption spatially or temporally across datacenter management units enables operators to minimize the carbon footprint, leveraging fluctuations in carbon intensity based on spatial and temporal dynamics. The progression towards automation has catalyzed the exploration of learning-based methodologies for resource management. This thesis first presents an analysis of the existing top-down approach in datacenter power management and its impact on application performance as well as the state-of-the-art learning-based resource management agents. We then propose a two-stage framework in holistically addressing the coordination between datacenter power management and resource management. The first stage adopts the existing top-down approach (e.g. FIRM) which starts at a reduction in power limit (e.g., due to power capping, power emergency, or power demand shifting) and ends with a resource allocation method at the container level (e.g. scale up containers). This stage systematically examines the repercussions of power capping on latency-sensitive datacenter workloads and the efficacy of learning-based resource management frameworks, notably reinforcement learning (RL). Our analysis reveals that a 20% reduction in power limits, instituted through power capping, precipitates an 18%diminution in resource management performance (as quantified by an RL reward function), culminating in a 50% escalation in application latency. In response, we propose an adaptive resource allocation schema that ensures a smooth, performance-preserving adaptation under power-capping conditions for latency-critical (LC) workloads. Our evaluation illustrates that this schema significantly ameliorates service-level objective (SLO) adherence under power capping scenarios, with improvements spanning 10.2 to 99.3%, while also enhancing utilization by 3.1 to 5.8%. The second stage is a bottom-up approach which starts with an estimation of the energy consumption of all the current running tasks. By proactively estimating the power demand from enforcing the SLOs of the LC workload with learning-based resource management, the bottom-up approach aggregates the total power demand needed at the datacenter cluster level and shifts the best-effort workloads either temporally or spatially to meet the power limit or budget. The benefit of a bottom-up approach is to avoid misprediction or sub-optimal resource management decisions for the LC job but to proactively leverage the temporal or spatial flexibility of the BE jobs. The optimization goal for the scheduler is to maximize the daily throughput of the BE jobs while meeting the power budget and the LC job SLOs. We formulate a Regression Modeling problem to predict the estimation of the power demand. We have tried several methods which include a simple fully-connected neural network, decision tree model, gradient-boosted trees, and random forest. The future work on the scheduler is to adjust the scaling action generated by the resource re-allocator based on the estimation of power demand.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Linghao Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…