Translation validation for compilation verification

Kasampalis, Theodoros

Translation validation for compilation verification

Kasampalis, Theodoros

Permalink

https://hdl.handle.net/2142/110460

Description

Title

Translation validation for compilation verification

Author(s)

Kasampalis, Theodoros

Issue Date

2021-04-13

Director of Research (if dissertation) or Advisor (if thesis)

Adve, Vikram S

Doctoral Committee Chair(s)

Adve, Vikram S

Committee Member(s)

Rosu, Grigore
Gunter, Elsa L
Regehr, John

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Program Equivalence
Compilers
Formal Verification
LLVM
K Framework
Translation Validation
Instruction Selection
Register Allocation

Abstract

Modern optimizing compilers such as LLVM and GCC are huge and complex, and mature releases routinely have uncaught bugs. Beyond harm to software development, the lack of formal correctness guarantees for the compilation process seriously limits the guarantees other software systems can provide, since the compiler that generates the final executable cannot be trusted. These circumstances have motivated broad interest in compilation verification: providing a formal guarantee that a compilation of a program is correct. Translation Validation is a commonly used compilation verification technique that aims to prove the correctness of a single instance of compilation, by considering only the specific input and output programs and treating the compiler mostly as a black box. Translation Validation techniques are well-suited to the compilation verification problem because they can be composed to validate a sequence of compilation steps, they can easily retrofit to existing compilers, and they can be maintained independently from the compiler itself by a separate team of formal method experts. The basic components of a Translation Validation system are (1) a formal notion of program equivalence, (2) a verification condition generator that generates a relation between program points and variables in the input and output programs, (3) a proof system that accepts the verification conditions, generates a machine-checkable equivalence proof, and checks the proof for correctness. Ideally, such a system is completely agnostic to the specifics of transformation from the input to the output as well as independent of the input/output languages. This allows the same system to be reused across the many transformation and translation passes found in modern compilers. However, this is not true in the state of the art: most existing systems are custom-tailored for a particular sequence of transformations, and moreover, specialized for a specific, common intermediate language for the input and output programs. The overall goal of this work is to show that it is possible to develop a (mostly) language-independent, transformation-agnostic translation validation system with support for different input/output languages for an optimizing, production-quality compiler. In this thesis, we present such a system as well as the theoretical and practical advances needed to arrive to it. First, we present a formal framework for program equivalence checking that is transformation-agnostic and language-independent. This framework can serve as-is as the proof system for any number of Translation Validation systems targeting different transformation and/or translation phases within an existing compiler. The basis of the framework is a rigorous formalization, namely cut-bisimulation, for weak bisimulation variants that serve as a generalization of the various (sometimes ad-hoc) notions of program equivalence found in the literature. We develop a program equivalence checking algorithm that proves two programs equivalent by reducing a proposed relation between corresponding program states to a cut-bisimulation relation. We implement this algorithm in KEQ, a new tool for checking program equivalence that accepts the operational semantics of the input and output languages as parameters, and is independent of the transformation used to generate the output. This is the first program equivalence checking tool known to the authors that is language-parametric instead of containing hard-coded language semantics as is the norm in the literature. Then, we use KEQ as the equivalence checker for two different Translation Validation systems targeting two phases of the LLVM compiler: the Instruction Selection phase and the Register Allocation phase. The two systems share the same notion of equivalence (cut-bisimulation), the same proof system (KEQ), as well as the semantic definitions for the input/output languages (LLVM IR and x86-64 based Machine IR), which are separate artifacts and not hardcoded into the logic of the systems. The only components that are transformation-specific are the two verification condition generators. The Instruction Selection one requires minimal support from the compiler in the form of compiler-generated hints, while the Register Allocation one is employing a novel inference algorithm for register allocation and related optimizations. These systems were evaluated on the GCC SPEC 2006 benchmark, where they correctly validated 4331 / 4732 (91.52%) and 4574 / 4732 (96.67%) functions with supported features respectively.

Graduation Semester

2021-05

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/110460

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Translation validation for compilation verification

Kasampalis, Theodoros

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In