Withdraw
Loading…
Navigating through the uncertainty of genotyping-by-sequencing data in polyploids
Mays, Wittney Debora
Loading…
Permalink
https://hdl.handle.net/2142/109337
Description
- Title
- Navigating through the uncertainty of genotyping-by-sequencing data in polyploids
- Author(s)
- Mays, Wittney Debora
- Issue Date
- 2020-10-05
- Director of Research (if dissertation) or Advisor (if thesis)
- Sacks, Erik J
- Committee Member(s)
- Clark, Lindsay V
- Ming, Ray
- Lipka, Alexander E
- Department of Study
- Crop Sciences
- Discipline
- Bioinformatics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- bioinformatics
- variant calling
- polyploidy
- genotyping-by-sequencing
- GBS
- Abstract
- The development of genotyping-by-sequencing (GBS) methods has facilitated genomics studies in non-model species, including polyploids. Variant and genotype calling methods have been established for autopolyploids but for a species with a complex genome, such as sugarcane, the level of uncertainty within GBS data increases making trait mapping difficult. Furthermore, variant and genotype calling methods remain a challenge for both recent and ancient allopolyploids (e.g. wheat, maize, soybean, Miscanthus), particularly where the reference genome contains highly similar paralogous sequences that do not pair at meiosis. Alignment of sequence tags to the appropriate position within highly duplicated reference genomes remains a challenge inadequately addressed by existing alignment software. Although some variant calling pipelines can discriminate a paralogous locus from a Mendelian locus, the detection of these paralogous loci is typically for the purpose of the exclusion of these loci from the downstream analysis of genomic studies. We explore the significance of eliminating paralogous loci in downstream analysis using a newly developed pipeline developed to sort sequence tags to their correct alignment locations based on the novel Hind/HE statistic. The goal of this study was to evaluate the sorting pipeline’s ability to properly align paralogous loci to the correct position with respect to the reference genome. Three studies were conducted with a population of 400 individuals simulated based upon the Triticum aestivum, the reanalysis of a previously published genome-wide study of fusarium head blight in 273 wheat breeding lines, and the reanalysis of a previously published genome-wide study of traits associated with yield in a Miscanthus diversity panel. Results from the study suggested that the filtering of sequences using the Hind/HE statistic underlying polyRAD v1.2 may lead differences in the output of sequences. Further comparison of each output suggested that the output of the novel pipeline, polyRAD, was concentrated in gene-rich regions compared to other standard variant calling pipelines. From this study, we provide recommendations for future users of the polyRAD v1.2 variant calling pipeline. Overall we recommend that polyRAD v1.2 is more useful for populations of outcrossing species.
- Graduation Semester
- 2020-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/109337
- Copyright and License Information
- Copyright 2020 Wittney Mays
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…