Optimizing data movement in cloud-bursting HPC environments through dynamic labeling and prefetching strategies
Tao, Huili
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/124598
Description
Title
Optimizing data movement in cloud-bursting HPC environments through dynamic labeling and prefetching strategies
Author(s)
Tao, Huili
Issue Date
2024-05-01
Director of Research (if dissertation) or Advisor (if thesis)
Kindratenko, Volodymyr
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
HPC, Cloud bursting, Data Movement
Abstract
Hybrid High-Performance Computing-Cloud systems are gaining popularity among researchers for their ability to handle sudden demand spikes, resulting in accelerated turnaround times for High-Performance Computing (HPC) tasks. However, deploying workloads on such systems presents challenges, particularly in data migration across HPC clusters and the Cloud, and the lack of support in existing schedulers for hybrid environments. To address these issues, we present an HPC-Cloud bursting system leveraging Ray, an open-source distributed framework. Our system seamlessly integrates automated data management with data prefetching and learning-based scheduling at the function level.
In this project, my primary focus was on implementing dynamic labeling within the Ray framework, enabling adjustments and modifications to node labels during the runtime. This dynamic labeling is then seamlessly integrated with the workload scheduler to facilitate strategic data prefetching to the most suitable nodes. Additionally, I played a pivotal role in enhancing the compatibility of our system with Cloud Storage Service, thereby expanding its versatility and usability.
We assess the effectiveness of our framework by employing two prevalent workloads: machine learning model training and image processing. Our findings reveal that our system consistently yields advantages across diverse data locations and network speeds when compared to the manual data fetching baseline for both workloads.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.