Withdraw
Loading…
A language and a system for program optimization
Santos Faria Xavier Teixeira, Thiago
Loading…
Permalink
https://hdl.handle.net/2142/109371
Description
- Title
- A language and a system for program optimization
- Author(s)
- Santos Faria Xavier Teixeira, Thiago
- Issue Date
- 2020-11-20
- Director of Research (if dissertation) or Advisor (if thesis)
- Padua, David
- Doctoral Committee Chair(s)
- Gropp, William
- Committee Member(s)
- Adve, Vikram
- Ancourt, Corinne
- Amarasingue, Saman
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- compilers
- code generation
- program optimization
- domain-specific language
- Abstract
- Hardware complexity has increased over time, and as architectures evolve and new ones are adopted, programs must often be altered by numerous optimizations to attain maximum computing power on each target environment. As a result, the code becomes unrecognizable over time, hard to maintain, and challenging to modify. Furthermore, as the code evolves, it is hard to keep the optimizations up to date. The need to develop and maintain separate versions of the application for each target platform is an immense undertaking, especially for the large and long-lived applications commonly found in the high-performance computing (HPC) community. This dissertation presents Locus, a new system, and a language for optimizing complex, long-lived applications for different platforms. We describe the requirements that we believe are necessary for making automatic performance tuning widely adopted. We present the design and implementation of a system that fulfills these requirements. It includes a domain-specific language that can represent complex collections of transformations, an interface to integrate external modules, and a database to manage platform-specific efficient code. The database allows the system’s users to access optimized code without having to install the code generation toolset. The Locus language allows the definition of a search space combined with the programming of optimization sequences separated from the application’s reference code. After all, we present an approach for performance portability. Our thesis is that we can ameliorate the difficulty of optimizing applications using a methodology based on optimization programming and automated empirical search. Our system automatically selects, generates, and executes candidate implementations to find the one with the best performance. We present examples to illustrate the power and simplicity of the language. The experimental evaluation shows that exploring the space of candidate implementations typically leads to better performing codes than those produced by conventional compiler optimizations that are based solely on heuristics. Locus was able to generate a matrix-matrix multiplication code that outperformed the IBM XLC internal hand-optimized version by 2× on the Power 9 processors. On Intel E5, Locus generates code with performance comparable to Intel MKL’s. We also improve performance relative to the reference implementation of up to 4× on stencil computations. Locus ability to integrate complex search spaces with optimization sequences can result in very complicated optimization programs. Locus compiler applies optimizations to remove from the optimization sequences unnecessary search statements making the exploration for faster implementations more accessible. We optimize matrix transpose, matrix-matrix multiplication, fast Fourier transform, symmetric eigenproblem, and sparse matrix-vector multiplication through divide and conquer. We implement three strategies using the Locus language to create search spaces to find the best shapes of the base case and the best ways of subdividing the problem. The search space representation for the divide-and-conquer strategy uses a combination of recursion and OR blocks. The Locus compiler automatically expands the recursion and ensures that the search space is correctly represented. The results showed that the empirical search was important to improve performance by generating faster base cases and finding the best splitting. We also use Locus to optimize large, complex applications. We match the performance of hand-optimized kernels of the Kripke transport code for different input data layouts. The Plascom2 multi-physics application is optimized to find the best way to use a multi-core CPU and GPU. The use of Tangram, Hydra, and OpenMP provided an interesting search space that improved performance by approximately 4.3× on ZAXPY and ZXDOTY kernels. Lastly, in a similar fashion to how a compiler works, we applied a search space representing a collection of optimization sequences to 856 loops extracted from 16 benchmarks that resulted in good performance improvements.
- Graduation Semester
- 2020-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/109371
- Copyright and License Information
- Copyright 2020 Thiago Santos Faria Xavier Teixeira
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…