Withdraw
Loading…
Elasticity and resource aware scheduling in distributed data stream processing systems
Peng, Boyang
Loading…
Permalink
https://hdl.handle.net/2142/78453
Description
- Title
- Elasticity and resource aware scheduling in distributed data stream processing systems
- Author(s)
- Peng, Boyang
- Issue Date
- 2015-04-27
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Elasticity
- Resource Aware Scheduling
- Storm
- Distributed Data Stream Processing
- Abstract
- The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems, lacks many important and desired features. One important feature is elasticity with clusters running Storm, i.e. change the cluster size on demand. Since the current Storm scheduler uses a naïve round robin approach in scheduling applications, another important feature is for Storm to have an intelligent scheduler that efficiently uses the underlying hardware by taking into account resource demand and resource availability when performing a scheduling. Both are important features that can make Storm a more robust and efficient system. Even though our target system is Storm, the techniques we have developed can be used in other similar stream processing systems. We have created a system called Stela that we implemented in Storm, which can perform on-demand scale-out and scale-in operations in distributed processing systems. Stela is minimally intrusive and disruptive for running jobs. Stela maximizes performance improvement for scale-out operations and minimally decrease performance for scale-in operations while not changing existing scheduling of jobs. Stela was developed in partnership with another Master’s Student, Le Xu [1]. We have created a system called R-Storm that does intelligent resource aware scheduling within Storm. The default round-robin scheduling mechanism currently deployed in Storm disregards resource demands and availability, and can therefore be very inefficient at times. R-Storm is designed to maximize resource utilization while minimizing network latency. When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. The problem of mapping tasks to machines can be reduced to Quadratic Multiple 3-Dimensional Knapsack Problem, which is an NP-hard problem. However, our proposed scheduling algorithm within R-Storm attempts to bypass the limitation associated with NP-hard class of problems. We evaluate the performance of both Stela and R-Storm through our implementations of them in Storm by using several micro-benchmark Storm topologies and Storm topologies in use by Yahoo! In. Our experiments show that compared to Apache Storm’s default scheduler, Stela’s scale-out operation reduces interruption time to as low as 12.5% and achieves throughput that is 45-120% higher than Storm’s. And for scale-in operations, Stela achieves almost zero throughput post scale reduction while two other groups experience 200% and 50% throughput decrease respectively. For R-Storm, we observed that schedulings of topologies done by R-Storm perform on average 50%-100% better than that done by Storm’s default scheduler.
- Graduation Semester
- 2015-5
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/78453
- Copyright and License Information
- Copyright 2015 Boyang Peng
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…