Withdraw
Loading…
Computational methods for genomic variant calling and analysis
Wickland, Daniel Paul
Loading…
Permalink
https://hdl.handle.net/2142/105742
Description
- Title
- Computational methods for genomic variant calling and analysis
- Author(s)
- Wickland, Daniel Paul
- Issue Date
- 2019-05-24
- Director of Research (if dissertation) or Advisor (if thesis)
- Hudson, Matthew E
- Doctoral Committee Chair(s)
- Hudson, Matthew E
- Committee Member(s)
- Asmann, Yan W
- Mainzer, Liudmila S
- Moose, Stephen P
- Vodkin, Lila O
- Department of Study
- Crop Sciences
- Discipline
- Informatics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- variant calling, Alzheimer's disease, batch effect, genotyping-by-sequencing
- Abstract
- The development of short-read, next-generation sequencing (NGS) has revolutionized biological research, agriculture and medicine, enabling innovations such as genomic selection to raise crop yields and precision medicine to diagnose and treat disease. The genetic polymorphisms identified by this high-throughput sequencing can serve as markers for association with phenotypic traits. Variant calling refers to the process of detecting genetic polymorphisms based on analysis of genome sequence data output by NGS technology. The projects described here investigate these analysis methods. Chapter One reviews variant calling and its application to human and plant genomic data. It opens by detailing the generation of sequence reads from biological samples and the conversion of those reads to meaningful data, emphasizing the importance of tool selection for analysis. Next, the use of sequencing to identify genetic risk factors in the context of Alzheimer’s disease is reviewed. The chapter concludes by describing the application of sequencing to analysis of plant genomes. Chapter Two presents a study of the impact of batch effect and study design on identification of genetic risk factors in human sequencing data. Sequencing-based searches for disease-associated variants require large sample sizes to achieve sufficient statistical power, but they often entail batch effects and biases from study design, both of which hinder the ability to detect true genotype-trait associations. We studied batch effects and confounding variables in whole-exome data from the Alzheimer’s Disease Sequencing Project and demonstrated that both significantly impacted the association analysis. In particular, we identified variants with novel disease associations that may have been influenced by population stratification and a confounding effect of age. Chapter Three reports a comparison of genotyping-by-sequencing (GBS) analysis methods on plant data. As a reduced-representation sequencing method to identify genetic variants and quickly genotype samples, GBS produces extensive missing data and requires complex bioinformatics analysis, particularly in the context of plants, which have highly variable ploidy and repeat content. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics pipeline that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. A comparison of five GBS pipelines using low-coverage sequence data from soybean demonstrated that GB-eaSy rapidly and accurately identified the greatest number of variants. In addition, the unexpectedly low convergence between the five analysis methods but generally high accuracy indicated that the workflows arrived at largely complementary sets of valid variant calls.
- Graduation Semester
- 2019-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/105742
- Copyright and License Information
- Copyright 2019 Daniel P. Wickland
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…