A Theory for Algorithm-Based Fault Tolerance in Array Processor Systems

Banerjee, Prithviraj

A Theory for Algorithm-Based Fault Tolerance in Array Processor Systems

Banerjee, Prithviraj

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/69305

Description

Title

A Theory for Algorithm-Based Fault Tolerance in Array Processor Systems

Author(s)

Banerjee, Prithviraj

Issue Date

1985

Department of Study

Electrical Engineering

Discipline

Electrical Engineering

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Computer Science

Abstract

The concept of algorithm-based fault-tolerance deals with system-level methods for obtaining reliable results from computations, especially when performed on array processor systems. In this scheme, algorithms have their outputs encoded in a system-level error-detecting or -correcting code. The algorithms rearrange the computations to allow an array processor system with a faulty processor to produce either a noncodeword output or the correct output. The algorithms also provide appropriate error-checking procedures which operate on the high-level encoded data.
This thesis deals with a theoretical study of the scheme of algorithm-based fault tolerance and addresses four issues. First, it deals with some design issues of specific fault-tolerant and fault-secure schemes. Algorithms are classified into broad classes called paradigms which are determined exclusively by the communication patterns of the processors. Fault-secure techniques are presented for three powerful paradigms: the multiplex, the recursive combination, and the multiplex-demultiplex paradigms.
The second part deals with the development of a model which can be used to analyze the fault-detecting and -locating capabilities of such algorithms. The model uses a broad interpretation of errors, faults and checks, which are represented as a tripartite graph. Three parameters are introduced to characterize the fault-tolerance scheme: the closure index, the masking index and the exposure index. Necessary and sufficient conditions for detecting and locating faults in processors during the actual computations are discussed. The model takes into account the data flow during a computation and can, therefore, pinpoint exactly which faulty processors can affect which data elements.
In the third part, some graph-theoretic bounds are presented on various useful characteristics in algorithm-based fault tolerance. The model is used to determine bounds on the number of data elements that a processor may affect while allowing t-fault detection or t-fault location. Using these results, some upper and lower bounds are presented on the number of checks required to achieve detection or location. Finally, in order to estimate the overhead required in this fault-tolerant scheme, some bounds are derived on the number of processors and the time required for the execution of the checks.
The last part of the thesis deals with a probabilistic study of the scheme. Expressions for the reliability of the results of the computation, and the time for the completion of the computation using a particular algorithm, are derived in terms of various parameters: the number of processors involved, the time for execution, and the fault-detecting, -locating, and -tolerating capabilities of the algorithm. Using the results of the probabilistic model, various alternative approaches of algorithm-based fault tolerance for a given problem are effectively compared.

Graduation Semester

1985

Type of Resource

text

Permalink

http://hdl.handle.net/2142/69305

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

A Theory for Algorithm-Based Fault Tolerance in Array Processor Systems

Banerjee, Prithviraj

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In