Withdraw
Loading…
Elastic techniques to handle dynamism in real-time data processing systems
Xu, Le
Loading…
Permalink
https://hdl.handle.net/2142/113814
Description
- Title
- Elastic techniques to handle dynamism in real-time data processing systems
- Author(s)
- Xu, Le
- Issue Date
- 2021-12-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Gupta, Indranil
- Doctoral Committee Chair(s)
- Gupta, Indranil
- Committee Member(s)
- Nahrstedt, Klara
- Huang, Jian
- Venkataraman, Shivaram
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Distributed Systems, Big-data Analytics, Real-time Data Stream Processing
- Abstract
- Real-time data processing is a crucial component of cloud computing today. It is widely adopted to provide an up-to-date view of data for social networks, cloud management, web applications, edge, and IoT infrastructures. Real-time processing frameworks are designed for time-sensitive tasks such as event detection, real-time data analysis, and prediction. Compared to handling offline, batched data, real-time data processing applications tend to be long-running and are prone to performance issues caused by many unpredictable environmental variables, including (but not limited to) job specification, user expectation, and available resources. In order to cope with this challenge, it is crucial for system designers to improve frameworks’ ability to adjust their resource usage to adapt to changing environmental variables, defined as system elasticity. This thesis investigates how elastic resource provisioning helps cloud systems today process real-time data while maintaining predictable performance under workload influence in an automated manner. We explore new algorithms, framework design, and efficient system implementation to achieve this goal. On the other hand, distributed systems today need to continuously handle various application specifications, hardware configurations, and workload characteristics. Maintaining stable performance requires systems to explicitly plan for resource allocation upon starting an application and tailor allocation dynamically during run time. In this thesis, we show how achieving system elasticity can help systems provide tunable performance under the dynamism of many environmental variables without compromising resource efficiency. Specifically, this thesis focuses on the two following aspects: i) Elasticity-aware Scheduling: Real-time data processing systems today are often designed in resource-, workload-agnostic fashion. As a result, most users are unable to perform resource planning before launching an application or adjust resource allocation (both within and across application boundaries) intelligently during the run. The first part of this thesis work (Stela [1], Henge [2], Getafix [3]) explores efficient mechanisms to conduct performance analysis while also enabling elasticity-aware scheduling in today’s cloud frameworks. ii) Resource Efficient Cloud Stack: The second line of work in this thesis aims to improve underlying cloud stacks to support self-adaptive, highly efficient resource provisioning. Today’s cloud systems enforce full isolation that prevents resource sharing among applications at a fine granularity over time. This work (Cameo [4], Dirigo) builds real- time data processing systems for emerging cloud infrastructures with high resource utilization through fine-grained resource sharing. Given that the market for real-time data analysis is expected to increase by the annual rate of 28.2% and reach 35.5 billion by the year 2024 [5], improving system elasticity can introduce a significant reduction to deployment cost and increase in resource utilization. Our works improve the performances of real-time data analytics applications within resource constraints. We highlight some of the improvements as the following: i) Stela explores elastic techniques for single-tenant, on-demand dataflow scale-out and scale-in operations. It improves post-scale throughput by 45-120% during on-demand scale-out and post-scale throughput by 2-5× during on-demand scale-in. ii) Henge develops a mechanism to map application’s performance into a unified scale of resource needs. It reduces resource consumption by 40-60% by maintaining the same level of SLO achievement throughout the cluster. iii) Getafix implements a strategy to analyze workload dynamically and proposes a solution that guides the systems to calculate the number of replicas to generate and the placement plan of these replicas adaptively. It achieves comparable query latency (both average and tail) by achieving 1.45-2.15× memory savings. iv) Cameo proposes a scheduler that supports data-driven, fine-grained operator execution guided by user expectations. It improves cluster utilization by 6× and reduces the performance violation by 72% while compacting more jobs into a shared cluster. v) Dirigo performs fully decentralized, function state-aware, global message scheduling for stateful functions. It is able to reduce tail latency by 60% compared to the local scheduling approach and reduce remote state accesses by 19× compared to the scheduling approach that is unaware of function states. These works can potentially lead to profound cost savings for both cloud providers and end-users.
- Graduation Semester
- 2021-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/113814
- Copyright and License Information
- © 2021 Le Xu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…