Withdraw
Loading…
Performance analysis and optimization of a CFD application
Zhang, Wentao
Loading…
Permalink
https://hdl.handle.net/2142/88072
Description
- Title
- Performance analysis and optimization of a CFD application
- Author(s)
- Zhang, Wentao
- Issue Date
- 2015-07-20
- Director of Research (if dissertation) or Advisor (if thesis)
- Bodony, Daniel J.
- Department of Study
- Mechanical Science & Engineering
- Discipline
- Mechanical Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Performance optimization
- computational fluid dynamics (CFD)
- Intel Xeon Phi
- Abstract
- This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques including pointer dereferencing, loop transformation and Fortran SIMD directives were applied to the top 10 time-consuming subroutines to remove obstacles to vectorization and to improve the serial performance. Details about the optimization techniques are presented and their impacts on performance are discussed. A 63% reduction in the number of memory loads and a serial speedup of 2.02 were obtained from the optimization efforts. Using the optimized serial program as the codebase, further investigation was focused on the analysis and optimization of parallel heterogeneous execution on a dual-socket node fitted with an Intel Xeon Phi MIC card. To reduce the overhead created by host-accelerator copies in heterogeneous execution, the data layout of the halo region was changed from a ''star'' shape to a ''box'' shape to agglomerate small communications and to create a larger work granularity. Preliminary results of running PlasComCM on Intel Xeon Phis in symmetric mode are also presented, where it was found that a 20% reduction in wall-clock time can be obtained for particular problem size when using 2 SandyBridge sockets + 1 Phi card vs 2 SandyBridge sockets.
- Graduation Semester
- 2015-8
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/88072
- Copyright and License Information
- Copyright 2015 Wentao Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…