Techniques for communication optimization of parallel programs in an adaptive runtime system

Robson, Michael P

Techniques for communication optimization of parallel programs in an adaptive runtime system

Robson, Michael P

Permalink

https://hdl.handle.net/2142/108622

Description

Title

Techniques for communication optimization of parallel programs in an adaptive runtime system

Author(s)

Robson, Michael P

Issue Date

2020-07-16

Director of Research (if dissertation) or Advisor (if thesis)

Kale, Laxmikant V

Doctoral Committee Chair(s)

Kale, Laxmikant V

Committee Member(s)

Torellas, Josep
Zilles, Craig
Quinn, Thomas

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

adaptive runtime
hpc
high performance computing
communication optimization
spreading
Charm
Charm++

Abstract

With the current continuation of Moore’s law and the presumed end of improved single core performance, high performance computing (HPC) has turned to increased on-node parallelism in order to address ever growing challenges and numbers of transistors. While this has resulted in a continued increase in overall computing performance, supercomputer networks have lagged far behind in their development and are now oftentimes the singular bottleneck in achieving performance and scalability in modern HPC applications. New machines are consistently built with ‘deeper’ nodes that improve the single node compute performance, as measured by the achievable floating point operations per second (FLOPs), relative to earlier generations with a corresponding increase in network bandwidth or sufficient decrease in latency. This unequal increase has previously partially been addressed by partitioning duties between runtimes at the shared memory node level, e.g. OpenMP, and distributed memory communication level, e.g. MPI, to create a model known as MPI+X. In this work, we present an alternative approach to improving the performance of modern HPC applications utilizing current generation supercomputer networks. We focus on the combination of several of the benefits of the Charm++ programming model, namely overdecompsition, with OpenMP and the ability to ‘spread’ work across several cores. This allows applications to smoothly inject messages onto the network, constantly overlapping their communication requirements with their compute phases, our overall focus for this work. We further describe a complementary suite of techniques to fully utilize modern supercomputers and balance FLOPs and communication. We extend these techniques through micro-benchmark studies and integration into the production scale Charm++ runtime. We also turn our attention from internode communication optimization to apply these same techniques to intranode communication between various hardware devices, i.e. CPUs and graphics processing units, as well. We also discuss many of the tradeoffs of these approaches and attempt to quantify their general effect. While embodied in the Charm++ runtime system, these ideas are applicable to a wide swath of communication bound applications, a class of programs that we expect to only grow over time with the continuing trend of increased differential between node and network performance.

Graduation Semester

2020-08

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/108622

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Techniques for communication optimization of parallel programs in an adaptive runtime system

Robson, Michael P

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In