Withdraw
Loading…
Runtime techniques for efficient execution of virtualized, migratable MPI ranks
White, Sam
Loading…
Permalink
https://hdl.handle.net/2142/117794
Description
- Title
- Runtime techniques for efficient execution of virtualized, migratable MPI ranks
- Author(s)
- White, Sam
- Issue Date
- 2022-11-29
- Director of Research (if dissertation) or Advisor (if thesis)
- Kale, Laxmikant V
- Doctoral Committee Chair(s)
- Kale, Laxmikant V
- Committee Member(s)
- Gropp, Bill
- Olson, Luke
- Hori, Atsushi
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- High Performance Computing
- Parallel Computing
- Communication
- Multithreading
- MPI
- Charm++
- Adaptive MPI
- Abstract
- The Message Passing Interface (MPI) is the dominant programming system for scientific applications that run on distributed memory parallel computers. MPI is a library specification or standard maintained by the MPI Forum. The first MPI standard was ratified in 1994, with MPICH providing a reference software implementation. Over the nearly 30 years since, the MPI standard has continued to grow, with version 4.0 ratified in 2021. Still, most MPI implementations today trace their roots back to MPICH or other systems developed in the 1990s. At that time, nodes consisted of a single core, memory hierarchies were relatively flat, systems had very high reliability, and performance was generally predictable. Modern HPC systems share none of these characteristics. All nodes are multicore, with increasing on-node parallelism available year after year. Extreme scale systems may be reliable as a system but suffer from individual node and link failures, limiting their usefulness for long-running jobs at large scale. Finally, performance has become harder to predict due to many factors, including processor frequency scaling and contention over shared resources. At the same time, scientific applications have become more dynamic themselves through the use of adaptive mesh refinements, multiscale methods, and multiphysics capabilities in order to simulate particular areas of interest with higher fidelity. Our work addresses all of these issues through overdecomposition, creating more schedulable tasks than cores. We use Adaptive MPI (AMPI), an MPI implementation developed on top of Charm++'s asynchronous tasking runtime system, as the basis for all of our work. AMPI works by virtualizing MPI ranks as user-level, migratable threads rather than operating system processes. In this thesis, we identify and overcome the issues associated with virtualizing MPI ranks as migratable user-level threads. These issues include problems of program correctness under virtualized execution, increased per-rank memory footprint, communication performance-- both point-to-point and collective, in terms of latency, bandwidth, and asynchrony-- and interoperability with other parallel programming systems commonly used on extreme scale systems. The resulting techniques and insights are applicable to other parallel programming systems and runtimes, while our AMPI implementation is as a result much more widely applicable and efficient for legacy MPI codes.
- Graduation Semester
- 2022-12
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Sam White
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…