Withdraw
Loading…
Algorithmic approaches to enhancing and exploiting application-level error tolerance
Sloan, Joseph
Loading…
Permalink
https://hdl.handle.net/2142/46706
Description
- Title
- Algorithmic approaches to enhancing and exploiting application-level error tolerance
- Author(s)
- Sloan, Joseph
- Issue Date
- 2014-01-16T17:59:47Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Kumar, Rakesh
- Doctoral Committee Chair(s)
- Kumar, Rakesh
- Committee Member(s)
- Vaidya, Nitin H.
- Gropp, William D.
- Abraham, Jacob A.
- Bronevetsky, Greg
- Department of Study
- Electrical & Computer Engineering
- Discipline
- Electrical & Computer Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Fault Tolerance
- Application-level Error Tolerance
- Algorithmic Based Fault Tolerance (ABFT)
- Application Robustification
- Stochastic Processors
- Reliability and Hardware Variability
- Error localization
- Partial Recomputation
- Robust Sparse Linear Algebra
- Algorithmic Selection for Error Resilience
- Abstract
- As late-CMOS process scaling leads to increasingly variable circuits/logic and as most post-CMOS technologies in sight appear to have largely stochastic characteristics, hardware reliability has become a first-order design concern. To make matters worse, emerging computing systems are becoming increasingly power constrained. Traditional hardware/software approaches are likely to be impractical for these power constrained systems due to their heavy reliance on redundant, worstcase, and conservative designs. The primary goal of this research has been to investigate how we can leverage inherent application and algorithm characteristics (e.g. natural error resilience, spatial and temporal reuse, and fault containment) to build more efficient robust systems. This dissertation research describes algorithmic approaches that leverage application and algorithm-awareness for building such systems. These approaches include (a) application-specific techniques for low-overhead fault detection, (b) an algorithmic approach for error correction using localization, (c) selection of scientific computing solver schemes to leverage application-level error resilience, and (d) a numerical optimization-based methodology for converting applications into a more error tolerant form. This dissertation shows that application and algorithm-awareness can significantly increase the robustness of computing systems, while also reducing the cost of meeting reliability targets.
- Graduation Semester
- 2013-12
- Permalink
- http://hdl.handle.net/2142/46706
- Copyright and License Information
- Copyright 2013 Joseph Augustyn Sloan
Owning Collections
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringGraduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…