From detection to optimization: impact of soft errors on high-performance computing applications
Calhoun, Jon Cameron
Loading…
Permalink
https://hdl.handle.net/2142/98379
Description
Title
From detection to optimization: impact of soft errors on high-performance computing applications
Author(s)
Calhoun, Jon Cameron
Issue Date
2017-07-12
Director of Research (if dissertation) or Advisor (if thesis)
Snir, Marc
Doctoral Committee Chair(s)
Olson, Luke N.
Committee Member(s)
Gropp, William
Cappello, Franck
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
High-performance computing
Fault tolerance
Silent data corruption
Soft errors
Error detection
Error recovery
Fault injection
Error propagation
Lossy compression
Checkpoint-restart
Abstract
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data transfers. Lossy compression adds small user controllable amounts of error when compressing data, to reduce data size before expensive data transfers saving time. This dissertation investigates and improves the resiliency of HPC applications to soft errors, and explores lossy compression as a new form of optimization for expensive, time-consuming data transfers.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.