Withdraw
Loading…
A multi-level scalable startup for parallel applications
Gupta, Abhishek
Loading…
Permalink
https://hdl.handle.net/2142/29453
Description
- Title
- A multi-level scalable startup for parallel applications
- Author(s)
- Gupta, Abhishek
- Issue Date
- 2012-02-01T00:46:59Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Kale, Laxmikant V.
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- parallel startup
- multi-level startup
- Hierarchical
- tree startup
- runtime
- Charm++
- Abstract
- High performance parallel machines with hundreds of thousands of processors and petascale performance are already in use, and even larger Exa flops scale computing systems which may have hundreds of millions of cores are planned. To run parallel applications on machines of such massive scale, one of the biggest challenges is the parallel startup process. This task involves two components: (1) parallel launching of appropriate processes on the given set of processors and (2) setting up communication channels to enable the processes to communicate with each other after process launching has completed. Most current startup mechanisms focus on either using special purpose daemons which waste system resources or using a startup manager which becomes a scalability bottleneck. In this thesis, we investigate the design and scalability of a SMP-aware, multi-level startup scheme with batching of remote shell sessions, which provides a complete solution to startup of a parallel application and facilitates its management during execution. It still supports existing Charm++ runtime capabilities including process health monitoring, facilitation of recovery from failures and scalable interaction with the application. We demonstrate the performance and scalability of this scheme by applying it to startup Charm++ applications. In particular, starting up a Charm++ program on 16,384 cores of Ranger (at TACC) with Ethernet as the underlying communication layer takes only 25 seconds and attains a speedup of over 400% compared to MPICH2-1.3 startup (using Hydra as process manager) and over 800% compared to Open MPI 1.3.1 startup on Ranger.
- Graduation Semester
- 2011-12
- Permalink
- http://hdl.handle.net/2142/29453
- Copyright and License Information
- Copyright 2011 Abhishek Gupta
Owning Collections
Dissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…