Withdraw
Loading…
Optimization of FFT communication on 3-D torus and mesh supercomputer networks
Arya, Anshu
Loading…
Permalink
https://hdl.handle.net/2142/45288
Description
- Title
- Optimization of FFT communication on 3-D torus and mesh supercomputer networks
- Author(s)
- Arya, Anshu
- Issue Date
- 2013-08-22T16:34:50Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Kale, Laxmikant V.
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- supercomputing
- Fast Fourier Transform (FFT)
- parallel
- topology
- networks
- communication
- torus
- mesh
- Abstract
- The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in a vast range of parallel scientific simulations warrants investigating its efficiency on a leading class of supercomputers that employ 3-D torus and mesh networks. Given the divide-and-conquer approach of the popular Cooley-Tukey approach, a parallel FFT can use a highly optimized serial algorithm for intra-node calculations, but the communication patterns between nodes reveal a potential for improvement. In this work, the scalability and communication patterns of global 1-D and 3-D parallel FFTs and how they map to 3-D torus and mesh networks are investigated. As a starting point, a naive implementation of a 1-D FFT is modified and scaled to 65,536 cores of a BlueGene/P machine. Insights into the machine architecture and network limits gained from the 1-D FFT are then applied to improve the performance of a 3-D FFT. Default mappings of a 3-D FFT onto a torus result in low network utilization and sub-optimal communication costs. The communication graph of a 3-D FFT is formulated and graph partitioning techniques are used to find a new mapping that greatly improves performance. Data migration techniques are then used to alleviate possible link contention associated with the new mappings. Migration techniques are discussed and modeled. It is discovered that specific migration patterns are able to reduce link contention without increasing the overall data movement (hop-bytes). With both mapping and migration techniques combined, a decrease of nearly 20% in the running time compared to the default 3-D FFT mapping is observed on 512 nodes, 2048 cores of a BlueGene/P machine.
- Graduation Semester
- 2013-08
- Permalink
- http://hdl.handle.net/2142/45288
- Copyright and License Information
- Copyright 2013 Anshu Arya
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…