Withdraw
Loading…
Dossier: Distributed operating system and infrastructure for scientific data management
Nguyen, Phuong Viet
Loading…
Permalink
https://hdl.handle.net/2142/101566
Description
- Title
- Dossier: Distributed operating system and infrastructure for scientific data management
- Author(s)
- Nguyen, Phuong Viet
- Issue Date
- 2018-07-12
- Director of Research (if dissertation) or Advisor (if thesis)
- Nahrstedt, Klara
- Doctoral Committee Chair(s)
- Nahrstedt, Klara
- Committee Member(s)
- Campbell, Roy H.
- Gupta, Indranil
- Turaga, Deepak
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- cyberinfrastructure
- microservice architecture
- adaptive control
- edge-cloud architecture
- scientific data management
- Abstract
- As scientific advancement and discovery have become increasingly data-driven and interdisciplinary, there are urging needs for advanced cyberinfrastructure to support managing and process- ing scientific data generated from day-to-day research. However, the development of data-driven cyberinfrastructure for scientific research areas has often lagged behind the development of such tools in other engineering and IT-related fields. Such the development gap is due to various diversity challenges of scientific data management and processing. First, these are the challenges in terms of the diversity of scientific data and data processing tasks, as the cyberinfrastructure should be able to support managing and processing heterogeneous types of scientific data that have been captured from scientific instruments. Second, as the cyberinfrastructure must help to shorten time from digital capture of data to interpretation and insights, it is challenging for the infrastructure to deal with the diversity of users and scientific workload. Third, it is the diversity of scientific instruments. Since there is still a significant number of scientific instruments that run their scientific software tools on old operating systems (e.g., Windows XP, Windows NT, Windows 2000), the cyberinfrastructure must help to bridge the performance and security gap between old scientific instruments and its advanced cloud-based infrastructure. In this thesis, we aim to address the above diversity challenges by taking a holistic approach in designing a distributed operating system and infrastructure for scientific data management, named DOSSIER. At the core of DOSSIER is an adaptive control microservice infrastructure that is de- signed to tackle the aforementioned challenges of data cyberinfrastructure for distributed scientific data management. Particularly, to handle heterogeneous scientific data processing and analysis, we start with redesigning the execution environment for scientific workflows, which traditionally follows a monolithic approach, using a novel microservice architecture and latest virtualization technology (i.e., container technology). The microservice design enables dynamic composition of workflows, and thus, is efficient in dealing with heterogeneous workflows. The new microservice architecture also allows us to express system resources in a more simple way, and thus, enables the design of a new adaptive resource management mechanism to handle large-scale and dynamic scientific workloads. We are the first to apply feedback control theory to design a self-adaptation mechanism for scientific workflow management system to help shorten the time from data acquisition to insights. To address the security and performance gap issues when connecting old scientific instruments to cloud-based cyberinfrastructure, we design an edge-cloud architecture that puts cloudlet servers directly connected to the scientific instruments and act as the security shield for the aging instruments. Cloudlets will also coordinate with cloud-based backend system to tackle the performance issue by scheduling data transfer and offloading processing tasks to cloudlets to avoid traffic congestion and guarantee performance of data processing jobs across edge-cloud architecture. By designing, developing, and testing DOSSIER in the real scientific environments, we demonstrate that an edge-cloud microservice architecture with learning-based adaptive control resource management is needed for timely distributed scientific data management.
- Graduation Semester
- 2018-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/101566
- Copyright and License Information
- Copyright 2018 Phuong Nguyen
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…