Compiler optimizations for parallel loops with fine-grained synchronization
Chen, Ding-Kai
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/19194
Description
Title
Compiler optimizations for parallel loops with fine-grained synchronization
Author(s)
Chen, Ding-Kai
Issue Date
1994
Doctoral Committee Chair(s)
Yew, Pen-Chung
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Science
Language
eng
Abstract
Loops in scientific and engineering applications provide a rich source of parallelism. In order to obtain a higher level of parallelism, loops with loop-carried dependences, which are largely serialized using the traditional techniques, need to be parallelized with fine-grained synchronization. This approach, so-called DOACROSS parallelization, requires new optimization strategies in order to preserve parallelism while minimizing the amount of inter-processor communication. In this thesis, I examine closely issues involved in the DOACROSS parallelization. There are two focuses in this work: (1) increasing parallelism, and (2) reducing communication overhead. Strategies for four major optimization problems are proposed and described in detail in this thesis. Our statement re-ordering and redundant synchronization optimization can enhance the overlap between iterations with reduced synchronization for loops with uniform dependences (i.e., with constant dependence distance vectors). For loops with non-uniform dependences, a new dependence uniformization scheme is used to compose a small set of uniform dependences which has a small dependence cone. These uniform dependences can be used to preserve all non-uniform dependences, and a small dependence cone size implies a higher speedup and lower communication overhead. Our last loop transformation is for loops with unknown dependences at compile time. It is a runtime parallelization technique with good locality and high concurrency. It has fairly consistent performance because its runtime analysis, which is usually the performance bottleneck of most runtime schemes, requires less global communication. We provide performance measurement and comparison with the schemes previously proposed. The results indicate that these schemes out-perform earlier schemes in terms of higher parallelism and lower communication requirement. These schemes form an integral part of the future high performance parallelizing compilers.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.