Withdraw
Loading…
Randomness-aware testing of machine learning-based systems
Dutta, Saikat
Loading…
Permalink
https://hdl.handle.net/2142/121445
Description
- Title
- Randomness-aware testing of machine learning-based systems
- Author(s)
- Dutta, Saikat
- Issue Date
- 2023-07-05
- Director of Research (if dissertation) or Advisor (if thesis)
- Misailovic, Sasa
- Doctoral Committee Chair(s)
- Misailovic, Sasa
- Committee Member(s)
- Marinov, Darko
- Adve, Vikram
- Gligoric, Milos
- Lahiri, Shuvendu
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Machine Learning
- Software Testing
- Abstract
- Machine Learning (ML) is rapidly revolutionizing the way modern-day systems are developed. However, testing ML-based systems is challenging due to 1) the presence of non-determinism, both internal (e.g., stochastic algorithms) and external (e.g., execution environment), and 2) the absence of well-defined accuracy specifications. Most traditional software testing techniques widely used today cannot tackle these challenges because they often assume determinism and require a precise test oracle. This dissertation presents work on automated testing and debugging of ML-based systems and on improving developer-written tests in such systems. To achieve these goals, this dissertation presents principled techniques that build on mathematical foundations from probability theory and statistics to reason about the underlying non-determinism and accuracy. The presented techniques help developers to detect more bugs and to efficiently navigate trade-offs between test quality and efficiency. This dissertation presents contributions along two key directions. First, a key challenge in testing ML-based systems is generating tests that can systematically and effectively explore the input space of the algorithms under test. However, because the inputs to an ML algorithm are complex objects, like a deep neural network model and data, generating inputs that are both syntactically and semantically correct is challenging. Additionally, ML algorithms do not come with well-defined test oracles, which makes it difficult to reason about correctness. This dissertation presents systematic test generation and debugging techniques that tackle these challenges by combining techniques from programming languages, differential testing, and probabilistic reasoning. These techniques have helped detect more than 50 previously unknown bugs in ML libraries and enabled faster debugging of failures. Second, this dissertation presents techniques that improve the quality of regression tests in ML libraries. When writing such tests, developers often fail to adequately account for the randomness of the algorithms under test and rely on guesswork in selecting various test configurations. Consequently, such regression tests often end up being either flaky, i.e., they pass or fail non-deterministically for same code, become expensive to run, or have lower fault-detection effectiveness. This dissertation presents novel test repair techniques that combine principled statistical methods, mathematical optimization, and domain knowledge to systematically tackle these challenges. These techniques have already improved the quality of over 200 tests in over 60 open-source ML libraries, many of which are used at companies like Google, Meta, Microsoft, and Uber as well as in many academic and scientific communities.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Saikat Dutta
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…