Multiple-implementation testing of supervised learning software

Alebiosu, Oreoluwa

Multiple-implementation testing of supervised learning software

Alebiosu, Oreoluwa

Permalink

https://hdl.handle.net/2142/97479

Description

Title

Multiple-implementation testing of supervised learning software

Author(s)

Alebiosu, Oreoluwa

Issue Date

2017-04-26

Director of Research (if dissertation) or Advisor (if thesis)

Xie, Tao

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2017-08-10T19:16:11Z

Keyword(s)

Machine learning
Multiple-implementation testing
Differential testing
Supervised learning
Multiple-implementation monitoring
Software monitoring
k-Nearest neighbor (kNN)
Naive Bayes
Software testing
Pseudo oracle
Algorithm configurations
Percentage threshold
Black box
Test oracle
Multiple implementation
NaiveBayes

Abstract

Machine Learning (ML) software, used to implement an ML algorithm, is widely used in many application domains such as financial, business, and engineering domains. Faults in ML software can cause substantial losses in these application domains. Thus, it is very critical to conduct effective testing of ML software to detect and eliminate its faults. However, testing ML software is difficult, especially on producing test oracles used for checking behavior correctness (such as using expected properties or expected test outputs). To tackle the test-oracle issue, this thesis presents a novel black-box approach of multiple-implementation testing for supervised learning software. The insight underlying the approach is that there can be multiple implementations (independently written) for a supervised learning algorithm, and majority of them may produce the expected output for a test input (even if none of these implementations are fault-free). In particular, the proposed approach derives a pseudo oracle for a test input by running the test input on n implementations of the supervised learning algorithm, and then using the common test output produced by a majority (determined by a percentage threshold) of these n implementations. The proposed approach includes techniques to address challenges in multiple-implementation testing (or generally testing) of supervised learning software: the definition of test cases in testing supervised learning software, along with resolution of inconsistent algorithm configurations across implementations. In addition, to improve dependability of supervised learning software during in-field usage while incurring low runtime overhead, The approach includes a multiple-implementation monitoring technique. The evaluations on the proposed approach show that multiple-implementation testing is effective in detecting real faults in real-world ML software (even popularly used ones), including 5 faults from 10 NaiveBayes implementations and 4 faults from 20 k-nearest neighbor implementations, and the proposed technique of multiple-implementation monitoring substantially reduces the need of running multiple implementations with high prediction accuracy.

Graduation Semester

2017-05

Type of Resource

text

Permalink

http://hdl.handle.net/2142/97479

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Multiple-implementation testing of supervised learning software

Alebiosu, Oreoluwa

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In