Withdraw
Loading…
Techniques for communication optimization of parallel programs in an adaptive runtime system
Robson, Michael P
Loading…
Permalink
https://hdl.handle.net/2142/108622
Description
- Title
- Techniques for communication optimization of parallel programs in an adaptive runtime system
- Author(s)
- Robson, Michael P
- Issue Date
- 2020-07-16
- Director of Research (if dissertation) or Advisor (if thesis)
- Kale, Laxmikant V
- Doctoral Committee Chair(s)
- Kale, Laxmikant V
- Committee Member(s)
- Torellas, Josep
- Zilles, Craig
- Quinn, Thomas
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- adaptive runtime
- hpc
- high performance computing
- communication optimization
- spreading
- Charm
- Charm++
- Abstract
- With the current continuation of Moore’s law and the presumed end of improved single core performance, high performance computing (HPC) has turned to increased on-node parallelism in order to address ever growing challenges and numbers of transistors. While this has resulted in a continued increase in overall computing performance, supercomputer networks have lagged far behind in their development and are now oftentimes the singular bottleneck in achieving performance and scalability in modern HPC applications. New machines are consistently built with ‘deeper’ nodes that improve the single node compute performance, as measured by the achievable floating point operations per second (FLOPs), relative to earlier generations with a corresponding increase in network bandwidth or sufficient decrease in latency. This unequal increase has previously partially been addressed by partitioning duties between runtimes at the shared memory node level, e.g. OpenMP, and distributed memory communication level, e.g. MPI, to create a model known as MPI+X. In this work, we present an alternative approach to improving the performance of modern HPC applications utilizing current generation supercomputer networks. We focus on the combination of several of the benefits of the Charm++ programming model, namely overdecompsition, with OpenMP and the ability to ‘spread’ work across several cores. This allows applications to smoothly inject messages onto the network, constantly overlapping their communication requirements with their compute phases, our overall focus for this work. We further describe a complementary suite of techniques to fully utilize modern supercomputers and balance FLOPs and communication. We extend these techniques through micro-benchmark studies and integration into the production scale Charm++ runtime. We also turn our attention from internode communication optimization to apply these same techniques to intranode communication between various hardware devices, i.e. CPUs and graphics processing units, as well. We also discuss many of the tradeoffs of these approaches and attempt to quantify their general effect. While embodied in the Charm++ runtime system, these ideas are applicable to a wide swath of communication bound applications, a class of programs that we expect to only grow over time with the continuing trend of increased differential between node and network performance.
- Graduation Semester
- 2020-08
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/108622
- Copyright and License Information
- Copyright 2020 Michael P. Robson
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…