An adaptive placement framework for efficient near-data stream processing over data source-edge-cloud systems

Sandur, Atul

An adaptive placement framework for efficient near-data stream processing over data source-edge-cloud systems

Sandur, Atul

Permalink

https://hdl.handle.net/2142/116111

Description

Title

An adaptive placement framework for efficient near-data stream processing over data source-edge-cloud systems

Author(s)

Sandur, Atul

Issue Date

2022-07-15

Director of Research (if dissertation) or Advisor (if thesis)

Agha, Gul

Doctoral Committee Chair(s)

Agha, Gul

Committee Member(s)

Nahrstedt, Klara
Xu, Tianyin
Jeon, Myeongjae

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

stream processing
analytics
edge computing
Internet-of-things
IoT
cloud
datacenter monitoring
server monitoring
near-data
query partitioning
placement
video analytics

Abstract

Large amounts of data are being generated across many different domains including datacenters, surveillance cameras, mobile devices and other Internet-of-things (IoT) systems. This data is generated as streams, i.e., an unbounded sequence of data items which needs to be processed in near real-time. Large-scale datacenters report generating up to 10s of PBs of monitoring data per day from hundreds of thousands of server nodes. High bandwidth video data at rates of 1.4 TB/hr or more, will be generated from autonomous vehicles and other IoT systems. Processing this data is challenging due to limited network and compute resources in the cloud. Utilizing compute resources closer to the data source helps reduce network transfer costs and alleviates the compute load on cloud nodes. However, near-data compute resources are limited. E.g. data source nodes may have a small fraction of compute resources (e.g., 1-2 CPU cores) for near-data processing compared to cloud servers. Analytics applications which process streaming data are also becoming complex and increasingly resource-intensive. Thus, all the analytics processing cannot be done on nodes near the data source, while using remote resources incurs network transfer costs. To alleviate the resulting compute-communication bottlenecks, applications are partitioned to reduce network transfer costs, by making placement decisions to split the execution of its operations between cloud and near-data resource nodes. Resource conditions are frequently changing in such systems, thus requiring the placement decisions to also adapt. As an example, monitoring queries on server nodes (i.e., data sources in monitoring pipelines) are co-located with foreground applications such as web services which have different compute resource needs based on incoming traffic requests. Thus, resources available to the monitoring queries change over time. Furthermore, compute resource needs of applications on edge and cloud nodes are also affected by changing input rates. E.g., surveillance camera applications consist of dynamic camera networks, with cameras being added or removed from the network to handle tasks such as object detection and target tracking. In this dissertation, we argue for the need to make highly efficient placement decisions to meet the application resource demands under fast changing resource conditions. Existing systems place application components between near-data resources and cloud nodes, with the goal to better utilize the available compute resources and improve application performance. However, they do not scale well to efficiently handle the increasing number of application operations/resource nodes in modern stream processing systems. Furthermore, placement techniques which optimize application power consumption on battery-driven data source nodes depend on accurate power prediction tools. Current tools typically rely on analytical power models based on hardware resource parameters, which are challenging to develop for modern hardware with complex and heterogeneous compute resources (e.g., multi-core CPUs, GPUs, TPUs). Moreover, modern application operations have complex power characteristics which depend on the input data stream. We address the above challenges by building a framework to make scalable placement decisions for near-data stream processing applications. We develop algorithms that identify effective placement decisions for application partitioning between near-data resources and the cloud. We build proof-of-concept systems to implement the proposed algorithms and validate their effectiveness in scaling to large workloads. The first algorithm focuses on optimizing the application’s end-to-end processing time, which is designed for making partitioning decisions between edges and cloud nodes. We map application operations to resource nodes on which they need to execute, using dynamic programming techniques to scale to large application graphs. The second system investigates the use of a fine-grained placement strategy on data source nodes, which can effectively utilize the limited compute resources on these nodes. An algorithm to quickly adapt the placement plan (in the order of seconds) is designed to handle fast changing resource conditions on data source nodes. Fast adaptation results from combining a query cost model-based and model agnostic heuristic to make placement decisions. Novel system-level abstractions are introduced in the stream processing pipeline to implement and evaluate the proposed placement algorithm. The third system is a first step towards extending our framework to support energy-aware placement decisions. Power cost of application operators is predicted without using complex hardware or software based analytical power models, which enables our system to be easily applied to devices with heterogeneous hardware resources. The power predictions can then be used to make energy-aware placement decisions.

Graduation Semester

2022-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

An adaptive placement framework for efficient near-data stream processing over data source-edge-cloud systems

Sandur, Atul

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In