Concurrent checkpointing for fast recovery in object-based systems
DeGroat, Joanne Elizabeth
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/22219
Description
Title
Concurrent checkpointing for fast recovery in object-based systems
Author(s)
DeGroat, Joanne Elizabeth
Issue Date
1991
Doctoral Committee Chair(s)
Davidson, Edward S.
Department of Study
Electrical and Computer Engineering
Discipline
Electrical and Computer Engineering
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Engineering, Electronics and Electrical
Computer Science
Language
eng
Abstract
Traditional checkpoint and recovery are based upon two basic assumptions. The first is the need to halt the computation in progress to save the state of the computation, i.e., take the checkpoint. The second assumption is that the entire state needs to be saved. These assumptions introduce fixed overhead into the system to take the checkpoint and consume space for variables whose state need not be saved. This research investigates a means of breaking these assumptions by developing an architecture that is capable of transparently saving the state of the executing process and of saving only that information required for recovery should an error occur. It also investigates a method of intermediate level recovery, i.e., recovery at levels above a single instruction and lower than that of checkpointing.
A model of computation is developed first to examine the nature and behavior of programs. The model breaks a program into basic blocks, segments of maximal in-line code. Intermediate blocks, blocks composed of several basic blocks, provide a representation at a higher level than basic blocks. At all levels, the model reveals significant information about computations and indicates an approach for the architecture.
The architecture, called the recovery architecture, is based on the concepts of objects and employs capability addressing. A means of transparently saving state, based on capability addressing, is developed. The method is called concurrent checkpointing and saves only that information required for recovery.
An evaluation of the architecture shows that it is capable of very fast recovery should an error occur. At a fine level of granularity where the program is broken into numerous blocks, the expected time of execution, even under very high error rates, may be close to the time of execution when no errors occur. The architecture has application in areas such as aircraft flight control and tracking systems where the expected time of execution of tasks is critical.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.