Withdraw
Loading…
Empirical accuracy bounds for next-generation sequencing variant calling workflows
Stephens, Zachary Daniel
Content Files

Loading…
Download Files
Loading…
Download Counts (All Files)
Loading…
Edit File
Loading…
Permalink
https://hdl.handle.net/2142/78801
Description
- Title
- Empirical accuracy bounds for next-generation sequencing variant calling workflows
- Author(s)
- Stephens, Zachary Daniel
- Issue Date
- 2015-05-01
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Date of Ingest
- 2015-07-22T22:46:09Z
- Keyword(s)
- Next-Generation Sequencing (NGS) Accuracy Benchmarking
- Next-Generation Error Analysis Toolkit (NEAT)
- Next-Generation Sequencing (NGS) Accuracy Bounds
- Abstract
- "This thesis investigates the accuracy bounds imposed on alignment-based variant calling workflows due to inherent uncertainties introduced by sequencing platforms. In this work we will use simulated data to empirically quantify the maximum performance that can be expected for alignment and variant detection accuracy in a workflow. Short read sequencers are inherently incapable of producing reads that can be uniquely mapped to every position of the human reference genome, so errors are inevitable. We will analyze the repetitive content of several organisms, and estimate the maximum attainable alignment accuracy as a function of read length. Additionally, we will show that paired-end sequencing with large insert sizes (also referred to as ""mate-pair"" sequencing) is capable of mapping >99% of the human genome. We have developed a set of tools, NEAT (Next-generation Error Analysis Toolkit), which we use to create fault-injected genomic datasets. Our experiments utilize these datasets to showcase how the behavior of BWA and GATK workflows changes as a function of read lengths, error rates, quality scores, error types, and mutation types. We utilize these results to quantify the performance gains that can be expected by altering these properties of an NGS dataset. Our results highlight the sensitivity of alignment software to read lengths and error rates, and the sensitivity of variant callers to quality scores and structural variation."
- Graduation Semester
- 2015-5
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/78801
- Copyright and License Information
- Copyright 2015 Zachary Stephens
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…