Scalable heterogeneous computing with asynchronous message-driven execution

Choi, Jaemin

Scalable heterogeneous computing with asynchronous message-driven execution

Choi, Jaemin

Permalink

https://hdl.handle.net/2142/116219

Description

Title

Scalable heterogeneous computing with asynchronous message-driven execution

Author(s)

Choi, Jaemin

Issue Date

2022-07-12

Director of Research (if dissertation) or Advisor (if thesis)

Kale, Laxmikant V

Doctoral Committee Chair(s)

Kale, Laxmikant V

Committee Member(s)

Rauchwerger, Lawrence
Kindratenko, Volodymyr
Bhatele, Abhinav
Garland, Michael

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Heterogeneous computing
GPGPU
Asynchronous message-driven execution
Overdecomposition
Runtime system

Abstract

Computer systems today are becoming increasingly heterogeneous, in response to increasingly demanding performance requirements of both traditional and emerging workloads including computational science, data science, and machine learning, pushing the limits of power and energy imposed by the silicon. Although the problem of data movement costs has been exacerbating as a consequence of increasingly complex memory hierarchies and heterogeneous computing resources, the popular approaches to parallel programming have largely remained to be a mixture of the Message Passing Interface (MPI) and a GPU programming model such as CUDA. Asynchronous message-driven execution, realized in the Charm++ parallel programming system, is an emerging model that has been proven to be effective in traditional CPU-based systems and large-scale parallel execution due to its adaptive features such as computation-communication overlap and dynamic load balancing. However, when applied to modern heterogeneous and GPU-accelerated systems, asynchronous message-driven execution presents many challenges when it comes to realizing overdecomposition and asynchronous progress which are necessary to achieve low overhead and minimal synchronization between the host and device as well as between the parallel work units for performance. In this dissertation, we analyze the issues in realizing efficient asynchronous message-driven execution on modern heterogeneous systems, and introduce new capabilities and approaches to address them in the form of runtime support in the Charm++ parallel programming system. To mitigate communication costs and minimize unnecessary synchronization overheads, we exploit automatic computation-communication overlap driven by overdecomposition and enable GPU-aware communication in the asynchronous message-driven execution model. We also combine these two approaches together to further improve performance and scalability on heterogeneous systems, and explore the effectiveness of techniques such as kernel fusion and CUDA Graphs to reduce the impact of kernel launch overheads especially with strong scaling. Finally, we investigate the possibilities of an entirely GPU-driven runtime system, CharminG, which seeks to realize asynchronous message-driven execution on the GPU with more user-level control of the GPU computing resources, enabled by GPU-resident scheduling, memory management, and messaging mechanisms. We discuss the challenges, limitations and potential improvements of such a GPU-centric approach of parallel programming towards the goal of developing an overarching runtime system that can efficiently utilize all of the available heterogeneous computing resources.

Graduation Semester

2022-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Scalable heterogeneous computing with asynchronous message-driven execution

Choi, Jaemin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In