Runtime techniques for efficient execution of virtualized, migratable MPI ranks

White, Sam

Runtime techniques for efficient execution of virtualized, migratable MPI ranks

White, Sam

Permalink

https://hdl.handle.net/2142/117794

Description

Title

Runtime techniques for efficient execution of virtualized, migratable MPI ranks

Author(s)

White, Sam

Issue Date

2022-11-29

Director of Research (if dissertation) or Advisor (if thesis)

Kale, Laxmikant V

Doctoral Committee Chair(s)

Kale, Laxmikant V

Committee Member(s)

Gropp, Bill
Olson, Luke
Hori, Atsushi

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

High Performance Computing
Parallel Computing
Communication
Multithreading
MPI
Charm++
Adaptive MPI

Abstract

The Message Passing Interface (MPI) is the dominant programming system for scientific applications that run on distributed memory parallel computers. MPI is a library specification or standard maintained by the MPI Forum. The first MPI standard was ratified in 1994, with MPICH providing a reference software implementation. Over the nearly 30 years since, the MPI standard has continued to grow, with version 4.0 ratified in 2021. Still, most MPI implementations today trace their roots back to MPICH or other systems developed in the 1990s. At that time, nodes consisted of a single core, memory hierarchies were relatively flat, systems had very high reliability, and performance was generally predictable. Modern HPC systems share none of these characteristics. All nodes are multicore, with increasing on-node parallelism available year after year. Extreme scale systems may be reliable as a system but suffer from individual node and link failures, limiting their usefulness for long-running jobs at large scale. Finally, performance has become harder to predict due to many factors, including processor frequency scaling and contention over shared resources. At the same time, scientific applications have become more dynamic themselves through the use of adaptive mesh refinements, multiscale methods, and multiphysics capabilities in order to simulate particular areas of interest with higher fidelity. Our work addresses all of these issues through overdecomposition, creating more schedulable tasks than cores. We use Adaptive MPI (AMPI), an MPI implementation developed on top of Charm++'s asynchronous tasking runtime system, as the basis for all of our work. AMPI works by virtualizing MPI ranks as user-level, migratable threads rather than operating system processes. In this thesis, we identify and overcome the issues associated with virtualizing MPI ranks as migratable user-level threads. These issues include problems of program correctness under virtualized execution, increased per-rank memory footprint, communication performance-- both point-to-point and collective, in terms of latency, bandwidth, and asynchrony-- and interoperability with other parallel programming systems commonly used on extreme scale systems. The resulting techniques and insights are applicable to other parallel programming systems and runtimes, while our AMPI implementation is as a result much more widely applicable and efficient for legacy MPI codes.

Graduation Semester

2022-12

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Runtime techniques for efficient execution of virtualized, migratable MPI ranks

White, Sam

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In