Elastic techniques to handle dynamism in real-time data processing systems

Xu, Le

Elastic techniques to handle dynamism in real-time data processing systems

Xu, Le

Permalink

https://hdl.handle.net/2142/113814

Description

Title

Elastic techniques to handle dynamism in real-time data processing systems

Author(s)

Xu, Le

Issue Date

2021-12-01

Director of Research (if dissertation) or Advisor (if thesis)

Gupta, Indranil

Doctoral Committee Chair(s)

Gupta, Indranil

Committee Member(s)

Nahrstedt, Klara
Huang, Jian
Venkataraman, Shivaram

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2022-04-29T21:34:07Z

Keyword(s)

Distributed Systems, Big-data Analytics, Real-time Data Stream Processing

Abstract

Real-time data processing is a crucial component of cloud computing today. It is widely adopted to provide an up-to-date view of data for social networks, cloud management, web applications, edge, and IoT infrastructures. Real-time processing frameworks are designed for time-sensitive tasks such as event detection, real-time data analysis, and prediction. Compared to handling offline, batched data, real-time data processing applications tend to be long-running and are prone to performance issues caused by many unpredictable environmental variables, including (but not limited to) job specification, user expectation, and available resources. In order to cope with this challenge, it is crucial for system designers to improve frameworks’ ability to adjust their resource usage to adapt to changing environmental variables, defined as system elasticity. This thesis investigates how elastic resource provisioning helps cloud systems today process real-time data while maintaining predictable performance under workload influence in an automated manner. We explore new algorithms, framework design, and efficient system implementation to achieve this goal. On the other hand, distributed systems today need to continuously handle various application specifications, hardware configurations, and workload characteristics. Maintaining stable performance requires systems to explicitly plan for resource allocation upon starting an application and tailor allocation dynamically during run time. In this thesis, we show how achieving system elasticity can help systems provide tunable performance under the dynamism of many environmental variables without compromising resource efficiency. Specifically, this thesis focuses on the two following aspects: i) Elasticity-aware Scheduling: Real-time data processing systems today are often designed in resource-, workload-agnostic fashion. As a result, most users are unable to perform resource planning before launching an application or adjust resource allocation (both within and across application boundaries) intelligently during the run. The first part of this thesis work (Stela [1], Henge [2], Getafix [3]) explores efficient mechanisms to conduct performance analysis while also enabling elasticity-aware scheduling in today’s cloud frameworks. ii) Resource Efficient Cloud Stack: The second line of work in this thesis aims to improve underlying cloud stacks to support self-adaptive, highly efficient resource provisioning. Today’s cloud systems enforce full isolation that prevents resource sharing among applications at a fine granularity over time. This work (Cameo [4], Dirigo) builds real- time data processing systems for emerging cloud infrastructures with high resource utilization through fine-grained resource sharing. Given that the market for real-time data analysis is expected to increase by the annual rate of 28.2% and reach 35.5 billion by the year 2024 [5], improving system elasticity can introduce a significant reduction to deployment cost and increase in resource utilization. Our works improve the performances of real-time data analytics applications within resource constraints. We highlight some of the improvements as the following: i) Stela explores elastic techniques for single-tenant, on-demand dataflow scale-out and scale-in operations. It improves post-scale throughput by 45-120% during on-demand scale-out and post-scale throughput by 2-5× during on-demand scale-in. ii) Henge develops a mechanism to map application’s performance into a unified scale of resource needs. It reduces resource consumption by 40-60% by maintaining the same level of SLO achievement throughout the cluster. iii) Getafix implements a strategy to analyze workload dynamically and proposes a solution that guides the systems to calculate the number of replicas to generate and the placement plan of these replicas adaptively. It achieves comparable query latency (both average and tail) by achieving 1.45-2.15× memory savings. iv) Cameo proposes a scheduler that supports data-driven, fine-grained operator execution guided by user expectations. It improves cluster utilization by 6× and reduces the performance violation by 72% while compacting more jobs into a shared cluster. v) Dirigo performs fully decentralized, function state-aware, global message scheduling for stateful functions. It is able to reduce tail latency by 60% compared to the local scheduling approach and reduce remote state accesses by 19× compared to the scheduling approach that is unaware of function states. These works can potentially lead to profound cost savings for both cloud providers and end-users.

Graduation Semester

2021-12

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/113814

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Elastic techniques to handle dynamism in real-time data processing systems

Xu, Le

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In