Performance evaluation of vector machine architectures
Tang, Ju-ho
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/20397
Description
Title
Performance evaluation of vector machine architectures
Author(s)
Tang, Ju-ho
Issue Date
1989
Doctoral Committee Chair(s)
Davidson, Edward S.
Department of Study
Electrical and Computer Engineering
Discipline
Electrical Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Engineering, Electronics and Electrical
Language
eng
Abstract
Vector machines are well known for their high-peak performance, but the delivered performance varies greatly over different workloads and depends strongly on compiler optimizations. Recently it has been claimed that several horizontal superscalar architectures, e.g., VLIW and polycyclic architectures, provide a more balanced performance across a wider range of scientific workloads than do vector machines. The purpose of this research is to study the performance of register-register vector processors, such as Cray supercomputers, as a function of their architectural features, scheduling schemes, compiler optimization capabilities, and program parameters. The results of this study also provide a base for comparing vector machines with horizontal superscalar machines.
An evaluation methodology, based on timing parameters, bottlenecks, and run time bounds, is developed. Cray-1 performance is degraded by the multiple memory loads of index-misaligned vectors and the inability of the Cray Fortran Compiler (CFT) to produce code that hits all the chain slot times. The Cray X-MP processor has three memory ports and supports flexible chaining, but its vector register reservation scheme poses a problem for the current CFT compilers, thereby reducing execution concurrency. The causes of the performance differences of two Cray Fortran compilers, CFT1.14 and CFT77(1.3), on the vectorized Livermore Fortran Kernels (LFKs) are discovered and some areas for further improvement are suggested.
The impact of chaining and two instruction scheduling schemes on one-memory-port vector supercomputers, illustrated by the Cray-1 and Cray-2, is studied. The lack of instruction chaining on the Cray-2 requires a different instruction scheduling scheme from that of the Cray-1. Situations are characterized in which simple vector scheduling can generate code that fully utilizes one functional unit for machines with chaining. Even without chaining, polycyclic scheduling guarantees full utilization of one functional unit, after an initial transient, for loops with acyclic dependence graphs.
The effectiveness of applying polycyclic vector scheduling (PVS) to the Cray-2 is compared with optimal simple vector scheduling on the Cray-1. More than 30% performance improvement on several vectorized LFKs is achieved by PVS over the current CFT77(2.0) compiler on the Cray-2. Some hardware modifications that could improve the effectiveness of applying PVS are evaluated.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.