Withdraw
Loading…
Building interactive distributed processing applications at a global scale
Abdollahian Noghabi, Shadi
Loading…
Permalink
https://hdl.handle.net/2142/102428
Description
- Title
- Building interactive distributed processing applications at a global scale
- Author(s)
- Abdollahian Noghabi, Shadi
- Issue Date
- 2018-11-13
- Director of Research (if dissertation) or Advisor (if thesis)
- Campbell, Roy H.
- Gupta, Indranil
- Doctoral Committee Chair(s)
- Campbell, Roy H.
- Gupta, Indranil
- Committee Member(s)
- Nahrstedt, Klara
- Bahl, Victor
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Distributed Systems
- Cloud Computing
- Geo-distributed
- Low Latency
- Large Scale
- Distributed Storage
- Stream Processing
- Edge Computing
- Container Networking
- Abstract
- Along with the continuous engagement with technology, many latency-sensitive interactive applications have emerged, e.g., global content sharing in social networks, adaptive lights/temperatures in smart buildings, and online multi-user games. These applications typically process a massive amount of data at a global scale. In this cases, distributing storage and processing is key to handling the large scale. Distribution necessitates handling two main aspects: a) the placement of data/processing and b) the data motion across the distributed locations. However, handling the distribution while meeting latency guarantees at large scale comes with many challenges around hiding heterogeneity and diversity of devices and workload, handling dynamism in the environment, providing continuous availability despite failures, and supporting persistent large state. In this thesis, we show how latency-driven designs for placement and data-motion can be used to build production infrastructures for interactive applications at a global scale, while also being able to address myriad challenges on heterogeneity, dynamism, state, and availability. We demonstrate a latency-driven approach is general and applicable at all layers of the stack: from storage, to processing, down to networking. We designed and built four distinct systems across the spectrum. We have developed Ambry (collaboration with LinkedIn), a geo-distributed storage system for interactive data sharing across the globe. Ambry is LinkedIn's mainstream production system for all its media content running across 4 datacenters and over 500 million users. Ambry minimizes user perceived latency via smart data placement and propagation. Second, we have built two processing systems, a traditional model, Samza, and the avant-garde model, Steel. Samza (collaboration with LinkedIn) is a production stream processing framework used at 15 companies (including LinkedIn, Uber, Netflix, and TripAdvisor), powering >200 pipelines at LinkedIn alone. Samza minimizes the impact of data motion on the end-to-end latency, thus, enabling large persistent state (100s of TB) along with processing. Steel (collaboration with Microsoft) extends processing to the emerging edge. Integrated with Azure, Steel dynamically optimizes placement and data-motion across the entire edge-cloud environment. Finally, we have designed FreeFlow, a high performance networking mechanisms for containers. Using the container placement, FreeFlow opportunistically bypasses networking layers, minimizing data motion and reducing latency (up to 3 orders of magnitude).
- Graduation Semester
- 2018-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/102428
- Copyright and License Information
- Copyright 2018 Shadi Abdollahian Noghabi
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…