Fault and Error Latency Under Real Workload - an Experimental Study
Chillarege, Ram
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/69341
Description
Title
Fault and Error Latency Under Real Workload - an Experimental Study
Author(s)
Chillarege, Ram
Issue Date
1986
Department of Study
Electrical Engineering
Discipline
Electrical Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Engineering, Electronics and Electrical
Abstract
This thesis demonstrates a practical methodology for the study of fault and error latency under real workload. This is the first study that measures and quantifies the latency under real workload and fills a major gap in the current understanding of workload-failure relationships. The methodology is based on low level data gathered on a VAX 11/780 during the normal workload conditions of the installation. Fault occurrence is simulated on the data, and the error generation and discovery process is reconstructed to determine latency. The analysis proceeds to combine the low level activity data with high level machine performance data to yield a better understanding of the phenomenon. This study finds a strong relationship between latency and workload and quantifies the relationship. The sampling and reconstruction techniques used are also validated.
Error latency in the memory where the operating system resides is studied using data on physical memory access. These data are gathered through hardware probes in the machine that samples the system during the normal workload cycle of the installation. The technique provides a means to study the system under different workloads and for multiple days. These data are used to reconstruct the error discovery process in the system. An approach to determine the fault miss percentage is developed and a verification of the entire methodology is also performed. This study finds that the mean error latency, in the memory containing the operating system, varies by a factor of 10 to 1 (in hours) between the low and high workloads. It is also found that of all errors occurring within a day, 70% are detected in the same day, 82% within the following day, and 91% within the third day.
Fault latency in the paged sections of memory is determined using data from physical memory scans. Fault latency distributions are generated for s-a-0 and s-a-1 permanent fault models. Results show that the mean fault latency of a s-a-0 fault is nearly 5 times that of the s-a-1 fault. Performance data gathered on the machine are used to study a workload-latency behavior. An analysis of variance model to quantify the relative influence of various workload measures on the evaluated latency is also given.
Error latency in the microcontrol store is studied using data on the microcode access and usage. These data are acquired using probes in the microsequencer of the CPU. It is found that the latency distribution has a large mode between 50 and 100 microcycles and two additional smaller modes. It is interesting to note that the error latency distribution in the microcontrol store is not exponential as suggested by other reported research.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.