Withdraw
Loading…
Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines
Mei, Chao
Loading…
Permalink
https://hdl.handle.net/2142/34238
Description
- Title
- Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines
- Author(s)
- Mei, Chao
- Issue Date
- 2012-09-18T21:07:22Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Kale, Laxmikant V.
- Doctoral Committee Chair(s)
- Kale, Laxmikant V.
- Committee Member(s)
- Padua, David A.
- Torrellas, Josep
- Balaji, Pavan
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Multicore shared-memory optimizations
- Multithreaded adaptive parallel language runtime
- MPI+OpenMP
- High Performance Computing (HPC)
- Load balancing
- Parallel programming
- Molecule dynamics simulation performance
- Charm++
- Abstract
- Multicore chips have become the standard building blocks for all current and future massively parallel machines. Much work has been done in scientific and engineering HPC applications to exploit shared-memory multicore nodes. This thesis, in contrast, pays close attention to the parallel language runtime system–a software layer that supports the execution of parallel applications. The essential idea is to parallelize the language runtime with threads as a natural consequence of the same general approach in applications to take advantage of the shared memory on a multicore node. Using the asynchronous message-driven CHARM++ runtime system as an evaluation platform, we address the key question of how the runtime should be designed and how it can be optimized for multicore nodes on parallel machines so that applications running atop the runtime can achieve better performance with as few changes as possible. Since the runtime performance on a single node is the basis for the overall runtime performance at scale, we have identified key factors for the runtime to run well on a single node, and developed corresponding optimization techniques. We have also developed the CkLoop library in the CHARM++ runtime, which showcases the necessity of a unified runtime that can make better support of the parallelism at different granularity. Furthermore, we have explored the design space of work responsibility assignment among the threads in the multithreaded runtime. In the context of a runtime design of dedicated communication threads, we have investigated the consequent communication issues with the help from our extension to a performance analysis tool, and proposed methods that can resolve the issues. To achieve even better performance in applications, we have shown how developers can leverage new capabilities offered by the runtime, and developed new load balancing strategies that are more effective on multicore platforms. Finally, we have demonstrated the performance improvement on real production-levelscientific applications, including NAMD, a widely-used molecular dynamics simulation program, by using this multithreaded runtime on petascale massively parallel machines. In the case of the 100M-atom STMV simulation using NAMD, the multithreaded runtime leads NAMD to achieve about two-fold performance improvement on 224,076 cores of JaguarPF (Cray XT5), and about three times improvement in machine utilization on Intrepid (BlueGene/P). It also makes NAMD more scalable up to the full machine of JaguarPF and Titan (Cray XK6).
- Graduation Semester
- 2012-08
- Permalink
- http://hdl.handle.net/2142/34238
- Copyright and License Information
- Copyright 2012 Chao Mei
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…