Withdraw
Loading…
Record and deterministic replay of parallel programs on multiprocessors
Honarmand, Nima
Loading…
Permalink
https://hdl.handle.net/2142/72886
Description
- Title
- Record and deterministic replay of parallel programs on multiprocessors
- Author(s)
- Honarmand, Nima
- Issue Date
- 2015-01-21
- Director of Research (if dissertation) or Advisor (if thesis)
- Torrellas, Josep
- Doctoral Committee Chair(s)
- Torrellas, Josep
- Committee Member(s)
- Adve, Sarita V.
- Adve, Vikram S.
- King, Samuel T.
- Pokam, Gilles
- Narayanasamy, Satish
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Record and Replay
- Deterministic Replay
- Parallel Programming
- Multiprocessors
- Computer Architecture
- Operating Systems
- Abstract
- Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer systems, including debugging, security and fault tolerance. RnR is typically a two phase process: in the first phase (record) enough information about an execution is logged which is then used in the second phase (replay) to re-create the execution. Application-level RnR seeks to record and replay single programs (or sets of programs) in isolation from the rest of the system. In this environment, there are typically two sources of non-determinism that an RnR solution should capture: program inputs (such as the results of the system calls the program makes to the OS or the signals the program receives) and the memory-access interleaving of concurrent threads that result in inter-thread data dependences. In order to enjoy wide acceptance, RnR solutions should be practical to build and, at the same time, enable a diverse range of use-cases (such as debugging and security analysis). Low recording overhead is a key requirement for many use cases of RnR. While software can often be used to record program inputs with low overhead, it can incur significant overhead to record memory-access interleaving. To reduce this overhead, hardware-assisted RnR techniques have been proposed. The main challenge here is devising hardware mechanisms that are simple enough to be embraced by processor vendors and, at the same time, powerful enough to work for the complex architectures of today. The first part of this thesis is a step in this direction—i.e., building practical and low overhead hardware-assisted RnR systems. We focus on the hardware-assisted RnR of parallel programs on multiprocessor machines. Firstly, we introduce QuickRec, the first physical realization of a hardware-assisted RnR system including new hardware and software. The focus of this project is understanding and evaluating the implementation issues of RnR on a real platform. We demonstrate that RnR can be implemented efficiently on a real multicore Intel Architecture (IA) system. We show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead, as expected. The evaluations however point to the software stack as the major source of overhead (incurring an average recording overhead of nearly 13%), an issue that was largely ignored by previous work on hardware-assisted RnR. We then address the problem of replay speed by introducing Cyrus, an RnR scheme that can record programs and replay them in parallel without making any changes to the cache coherence protocol and messages. The proposal uses a novel hybrid hardware/software mechanism for recording memory access interleaving. The hardware component records a raw and incomplete log that is then processed and transformed into a complete log by an on-the-fly software Backend Pass. As the raw log is being generated, this pass transforms it for high replay parallelism. This can also flexibly trade-off replay parallelism for log size. We evaluate Cyrus through full-system simulation including simulated hardware and using the same real software stack that was used in QuickRec. QuickRec and Cyrus are limited in terms of the memory consistency models they support: Total Store Order (TSO) and Sequential Consistency (SC), respectively. To enable RnR for other architectures whose memory model is more relaxed, we then propose RelaxReplay. It is a general hardware-assisted MRR scheme that works for any relaxed-consistency model of current processors and does not require any changes to the underlying coherence protocol and messages. RelaxReplay’s core innovation is a new way of capturing memory access reordering. Each memory instruction goes through a post-completion in-order counting step that detects any reordering, and efficiently records it. The evaluations show that RelaxReplay induces negligible recording overhead and that the average size of the log produced is only 1–4x as large as in existing solutions— still very small compared to the memory bandwidth of modern machines. After considering the challenges of building practical RnR systems, the next question to be answered is that of their usability. The last part of this thesis investigates the issue of using the RnR technology in program debugging, the most commonly cited use-case of replay. RnR enables deterministic reproduction of hard-to-repeat software bugs. However, simply providing support for repeatedly stumbling on the same bug does not help diagnose it. For bug diagnosis, developers typically augment the program source with debug code—E.g., by creating and operating on new variables, or printing state. Unfortunately, this renders the RnR log inconsistent and makes Replay Debugging (i.e., debugging while using an RnR log for replay) dicey at best. To attack this problem, we propose rdb, the first scheme for replay debugging that guarantees exact replay in the presence of debug code. rdb relies on two mechanisms. The first one is compiler support to split the instrumented application into two executables: one that is identical to the original program binary, and another that encapsulates all the added debug code. The second mechanism is a runtime infrastructure that replays the application and, without affecting it in any way, invokes the appropriate debug code at the appropriate locations. We describe an implementation of rdb based on LLVM and Pin, and show an example of how rdb’s replay debugging helps diagnose a real bug.
- Graduation Semester
- 2014-12
- Permalink
- http://hdl.handle.net/2142/72886
- Copyright and License Information
- Copyright 2014 Nima Honarmand
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…