Randomness-aware testing of machine learning-based systems

Dutta, Saikat

Randomness-aware testing of machine learning-based systems

Dutta, Saikat

Permalink

https://hdl.handle.net/2142/121445

Description

Title

Randomness-aware testing of machine learning-based systems

Author(s)

Dutta, Saikat

Issue Date

2023-07-05

Director of Research (if dissertation) or Advisor (if thesis)

Misailovic, Sasa

Doctoral Committee Chair(s)

Misailovic, Sasa

Committee Member(s)

Marinov, Darko
Adve, Vikram
Gligoric, Milos
Lahiri, Shuvendu

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Machine Learning
Software Testing

Abstract

Machine Learning (ML) is rapidly revolutionizing the way modern-day systems are developed. However, testing ML-based systems is challenging due to 1) the presence of non-determinism, both internal (e.g., stochastic algorithms) and external (e.g., execution environment), and 2) the absence of well-defined accuracy specifications. Most traditional software testing techniques widely used today cannot tackle these challenges because they often assume determinism and require a precise test oracle. This dissertation presents work on automated testing and debugging of ML-based systems and on improving developer-written tests in such systems. To achieve these goals, this dissertation presents principled techniques that build on mathematical foundations from probability theory and statistics to reason about the underlying non-determinism and accuracy. The presented techniques help developers to detect more bugs and to efficiently navigate trade-offs between test quality and efficiency. This dissertation presents contributions along two key directions. First, a key challenge in testing ML-based systems is generating tests that can systematically and effectively explore the input space of the algorithms under test. However, because the inputs to an ML algorithm are complex objects, like a deep neural network model and data, generating inputs that are both syntactically and semantically correct is challenging. Additionally, ML algorithms do not come with well-defined test oracles, which makes it difficult to reason about correctness. This dissertation presents systematic test generation and debugging techniques that tackle these challenges by combining techniques from programming languages, differential testing, and probabilistic reasoning. These techniques have helped detect more than 50 previously unknown bugs in ML libraries and enabled faster debugging of failures. Second, this dissertation presents techniques that improve the quality of regression tests in ML libraries. When writing such tests, developers often fail to adequately account for the randomness of the algorithms under test and rely on guesswork in selecting various test configurations. Consequently, such regression tests often end up being either flaky, i.e., they pass or fail non-deterministically for same code, become expensive to run, or have lower fault-detection effectiveness. This dissertation presents novel test repair techniques that combine principled statistical methods, mathematical optimization, and domain knowledge to systematically tackle these challenges. These techniques have already improved the quality of over 200 tests in over 60 open-source ML libraries, many of which are used at companies like Google, Meta, Microsoft, and Uber as well as in many academic and scientific communities.

Graduation Semester

2023-08

Type of Resource

Thesis

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Randomness-aware testing of machine learning-based systems

Dutta, Saikat

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In