A hybrid fault injection environment for measuring system dependability
Young, Luke Titus
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/20674
Description
Title
A hybrid fault injection environment for measuring system dependability
Author(s)
Young, Luke Titus
Issue Date
1993
Doctoral Committee Chair(s)
Iyer, Ravishankar K.
Department of Study
Electrical and Computer Engineering
Discipline
Electrical Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Engineering, Electronics and Electrical
Computer Science
Language
eng
Abstract
This thesis describes a test environment for evaluating computer system dependability, wherein faults are injected via software and the impact is measured by both software and hardware. The hybrid nature of the environment provides advantages in that it introduces minimal perturbation and provides a high degree of control over the location of faults to be injected. With this environment, faults can be injected into any location that has a physical address, e.g., CPU registers, cache, local memory, mass storage, and network controllers. Faults can also be injected into locations allocated to a single, executing user program or even into the kernel, and propagation can be characterized down to the instruction level. The environment is well suited for measuring extremely short error latencies. We illustrate the environment by applying it to the study of two commercial systems: A Unix-based, Tandem Integrity system and a Texas Instruments Explorer II Lisp machine.
Featured capabilities of the environment yielded several key results: High degrees of accuracy in measuring latency (within 20 ns) were obtained. Measurements of the sensitivity of different instructions to faults indicate a 5 percent chance that a faulted mips RISC instructions will not fail when executed. Modeling of multi-level error propagation show that error detections were due to multiple corruptions of state in as much as 57 percent of reads to wrong addresses and 37 percent of the writes to wrong addresses. The median latency associated with error detection by an individual CPU was on the order of 10 $\mu$s and that the median delay between detection and the start of CPU shutdown was on the order of 100ms. And Kernel fault injection studies show that a fault in the kernel is 2.6 times as likely to bring down a CPU as a fault elsewhere.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.