Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Choi, Lynn
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/20919
Description
Title
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Author(s)
Choi, Lynn
Issue Date
1996
Doctoral Committee Chair(s)
Padua, David A.
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Engineering, Electronics and Electrical
Engineering, System Science
Computer Science
Language
eng
Abstract
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the temporal and spatial locality of memory reference patterns, private caches can eliminate redundant memory accesses and thereby reduce both average memory latency and network traffic. However, maintaining cache coherence for such systems is still a challenge. Hardware directories can be very effective, but are too expensive for large-scale multiprocessors.
As an alternative, compiler-directed techniques (4, 5, 6, 7, 8, 9, 10, 11, 14) can be used to maintain coherence. In this approach, cache coherence is maintained locally without directory hardware, thus avoiding the complexity and overhead associated with hardware directories. Although the performance of such schemes has been demonstrated through simulations, most of the studies assume either perfect compile-time analysis or analytical models without real compiler implementations (1, 3, 9, 10, 12, 13). It is still unknown how effectively the compiler can detect potentially stale references and what kind of performance can be obtained using a real compiler. Also, most of the compiler-directed coherence schemes proposed to date have not addressed the real cost of the required hardware support. For example, many of the schemes require expensive hardware support and assume a cache organization with single-word cache lines.
This dissertation addresses these hardware and compiler implementation issues and investigates the feasibility and performance of the compiler-directed cache coherence approach. We propose a new compiler-directed scheme that can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors. The scheme can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration also have been addressed. The cost of the required hardware support is minimal and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data flow analysis, have been developed, and implemented in the Polaris parallelizing compiler, and experimentation results on the Perfect Club benchmarks (2) are discussed.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.