Robust and reliable hardware accelerator design through high-level synthesis

Campbell, Keith A

Robust and reliable hardware accelerator design through high-level synthesis

Campbell, Keith A

Permalink

https://hdl.handle.net/2142/99294

Description

Title

Robust and reliable hardware accelerator design through high-level synthesis

Author(s)

Campbell, Keith A

Issue Date

2017-09-21

Director of Research (if dissertation) or Advisor (if thesis)

Chen, Deming

Doctoral Committee Chair(s)

Chen, Deming

Committee Member(s)

Hwu, Wen-Mei W.
Wong, Martin D. F.
Kim, Nam Sung

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

High-level synthesis (HLS)
Automation
Error detection
Scheduling
Binding
Compiler transformation
Compiler optimization
Pipelining
Modulo arithmetic
Modulo-3
Logic optimization
State machine
Datapath
Control logic
Shadow datapath
Modulo datapath
Low cost
High performance
Electrical bug
Aliasing
Stuck-at fault
Soft error
Timing error
Checkpointing
Rollback
Recovery
Pre-silicon validation
Post-silicon validation
Pre-silicon debug
Post-silicon debug
Accelerator
System on a chip
Signature generation
Execution signature
Execution hash
Logic bug
Nondeterministic bug
Masked error
Circuit reliability
Hot spot
Wear out
Silent data corruption
Observability
Detection latency
Mixed datapath
Diversity
Checkpoint corruption
Error injection
Error removal
Quick Error Detection (QED)
Hybrid Quick Error Detection (H-QED)
Instrumentation
Hybrid co-simulation
Hardware/software
Integration testing
Hybrid tracing
Hybrid hashing
Source-code localization
Software debugging tool
Valgrind
Clang sanitizer
Clang static analyzer
Cppcheck
Root cause analysis
Execution tracing
Realtime error detection
Simulation trigger
Nonintrusive
Address conversion
Undefined behavior
High-level synthesis (HLS) bug
Detection coverage
Gate-level architecture
Mersenne modulus
Full adder
Half adder
Quarter adder
Wraparound
Modulo reducer
Modulo adder
Modulo multiplier
Modulo comparator
Cross-layer
Algorithm
Instruction
Architecture
Logic synthesis
Physical design
Algorithm-based fault tolerance (ABFT)
Error detection by duplicated instructions (EDDI)
Parity
Flip-flop hardening
Layout design through error-aware transistor positioning dual interlocked storage cell (LEAP-DICE)
Cost-effective
Place-and-route
Field programmable gate array (FPGA) emulation
Application specific integrated circuit (ASIC)
Field programmable gate array (FPGA)
Energy
Area
Latency

Abstract

System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling-driven complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This dissertation shows that high-level synthesis also has the power to address validation and reliability challenges through three automated solutions targeting three key stages in the hardware design and use cycle: pre-silicon debugging, post-silicon validation, and post-deployment error detection. Our solution for rapid pre-silicon debugging of accelerator designs is hybrid tracing: comparing a datapath-level trace of hardware execution with a reference software implementation at a fine temporal and spatial granularity to detect logic bugs. An integrated backtrace process delivers source-code meaning to the hardware designer, pinpointing the location of bug activation and providing a strong hint for potential bug fixes. Experimental results show that we are able to detect and aid in localization of logic bugs from both C/C++ specifications as well as the high-level synthesis engine itself. A variation of this solution tailored for rapid post-silicon validation of accelerator designs is hybrid hashing: inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using hybrid hashing, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. Hybrid hashing also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. Hybrid hashing incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by hybrid hashing. Finally, our solution for post-deployment error detection is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150× faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175× increase in reliability against soft errors. While the area cost of our modulo shadow datapaths is much better than traditional modular redundancy approaches, we want to maximize the applicability of our approach. To this end, we take a dive into gate-level architectural design for modulo arithmetic functional units. We introduce new low-cost gate-level architectures for all four key functional units in a shadow datapath: (1) a modulo reduction algorithm that generates architectures consisting entirely of full-adder standard cells; (2) minimum-area modulo adder and subtractor architectures; (3) an array-based modulo multiplier design; and (4) a modulo equality comparator that handles the residue encoding produced by the above. We compare our new functional units to the previous state-of-the-art approach, observing a 12.5% reduction in area and a 47.1% reduction in delay for a 32-bit mod-3 reducer; that our reducer costs, which tend to dominate shadow datapath costs, do not increase with larger modulo bases; and that for modulo-15 and above, all of our modulo functional units have better area and delay then their previous counterparts. We also demonstrate the practicality of our approach by designing a custom shadow datapath for error detection of a multiply accumulate functional unit, which has an area overhead of only 12% for a 32-bit main datapath and 2-bit modulo-3 shadow datapath. Taking our reliability solution further, we look at the bigger picture of modulo shadow datapaths combined with other solutions at different abstraction layers, looking to answer the following question: Given all of the existing reliability improvement techniques for application-specific hardware accelerators, what techniques or combinations of techniques are the most cost-effective? To answer this question, we consider a soft error fault model and empirically evaluate cross-layer combinations of ABFT, EDDI, and modulo shadow datapaths in the context of high-level synthesis; parity in logic synthesis; and flip-flop hardening techniques at the physical design level. We measure the reliability benefit and area, energy, and performance cost of each technique individually and for interesting technique combinations through FPGA emulated fault-injection and physical place-and-route. Our results show that a combination of parity and flip-flop hardening is the most cost-effective in general with an average 1.3% area cost and 5.7% energy cost for a 50× improvement in reliability. The addition of modulo-3 shadow datapaths to this combination provides some additional benefit for some applications, even without considering its combinational logic, stuck-at fault, and timing error protection benefits. We also observe new efficiency challenges for ABFT and EDDI when used for hardware accelerators.

Graduation Semester

2017-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/99294

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Robust and reliable hardware accelerator design through high-level synthesis

Campbell, Keith A

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In