Reducing Interprocessor Communication in Parallel Architectures: System Configuration and Task Assignment
Abraham, Santosh George
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/69392
Description
Title
Reducing Interprocessor Communication in Parallel Architectures: System Configuration and Task Assignment
Author(s)
Abraham, Santosh George
Issue Date
1988
Doctoral Committee Chair(s)
Davidson, Edward S.
Department of Study
Electrical Engineering
Discipline
Electrical Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Engineering, Electronics and Electrical
Abstract
Most future supercomputers will be based on parallel processing architectures. Currently, many applications with sufficient parallelism cannot efficiently run on large parallel processing systems because interprocessor communication and synchronization significantly degrade performance. Interprocessor communication is incurred on message passing systems when two processors exchange data through communication links and in shared memory systems when variables stored in shared memory are written into by one processor and then read out by another processor.
Two aspects of the communication bottleneck in multiprocessors, namely, delay and bandwidth, are identified. The tradeoff between the cost of the processors and the cost of the communication structure is analyzed and expressions are derived for the optimum number of processors that minimizes the total cost of achieving a desired performance level. Maximum speedups are limited by communication delays for most combinations of applications and parallel architectures.
Communication delay models are developed for a hierarchical multiprocessor where all processors share a global memory and processors are grouped into disjoint clusters, each with a cluster memory shared by the processors in that cluster. Three different classes of algorithms are analyzed and techniques are derived to determine the optimum cluster size, i.e., the number of processors in each cluster, that minimizes average communication delays and hence completion times. The effectiveness of hierarchical multiprocessors in emulating common interconnection schemes, viz., ring, mesh, and hypercube, is analyzed.
The solution of sparse systems of equations on the Alliant FX/8 multiprocessor is used to explore computation-communication tradeoffs. Blocking improves communication time but reduces parallelism. Optimal block sizes for various blocking schemes are determined both on the Alliant FX/8 and on an emulation of a multiprocessor with slower global memory. The execution of a linear system solver, Block Solve, on a hierarchical multiprocessor is simulated and optimum cluster sizes that minimize completion times are determined.
Algorithms that use flow graph techniques to assign tasks to processors with the objective of minimizing the sum total of all communication costs are described. An algorithm that obtains a complete assignment with a worst-case bound of less than twice the optimal is described.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.