Withdraw
Loading…
Enhancing the reliability of dynamic management in cloud infrastructures
Liu, Bingzhe
Loading…
Permalink
https://hdl.handle.net/2142/124161
Description
- Title
- Enhancing the reliability of dynamic management in cloud infrastructures
- Author(s)
- Liu, Bingzhe
- Issue Date
- 2024-04-23
- Director of Research (if dissertation) or Advisor (if thesis)
- Godfrey, Brighten
- Doctoral Committee Chair(s)
- Godfrey, Brighten
- Committee Member(s)
- Beckett, Ryan
- Gupta, Indranil
- Xu, Tianyin
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reliability
- Cloud Infrastructure
- Kubernetes
- Cloud Management
- Formal methods
- Verification
- Synthesis
- Failure Studies
- Abstract
- Modern cloud infrastructure is powered by cluster management systems such as Kubernetes and Docker Swarm. These systems consist of multiple dynamic controllers that continuously monitor the underlying systems to collect metrics and move the system toward the desired goal. While these systems seek to minimize users' operational burden, the complex, dynamic, and non-deterministic nature of these systems makes them hard to reason about, potentially leading to failures ranging from performance degradation to outages. In this thesis, we aim to enhance the reliability of dynamic management in cloud infrastructure. We first conduct a series of failure studies on a diverse range of outages to gain insights into how things can go wrong. Leveraging these insights, we then exploit formal methods to help improve reliability. In particular, we develop the system Strategyzer, which automatically synthesizes reliable management plans for telecommunication operators that involve both non-trivial dynamic controllers as well as human actions. We implemented two management examples in Strategyzer and showed that the approach can generate plans accurately for these tasks. We then discuss Kivi, which is the first system for verifying controllers and their configurations in cluster management systems. Kivi takes the users' intent, the state and configuration of the cluster, and the event assumptions as inputs, and verifies if the cluster can violate the intent. If a violation is possible, it generates a minimal counterexample. We show that Kivi is effective and accurate in finding issues in realistic and complex scenarios and showcase two new issues in Kubernetes controller source code.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Bingzhe Liu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…