Navigating through the uncertainty of genotyping-by-sequencing data in polyploids

Mays, Wittney Debora

Navigating through the uncertainty of genotyping-by-sequencing data in polyploids

Mays, Wittney Debora

Permalink

https://hdl.handle.net/2142/109337

Description

Title

Navigating through the uncertainty of genotyping-by-sequencing data in polyploids

Author(s)

Mays, Wittney Debora

Issue Date

2020-10-05

Director of Research (if dissertation) or Advisor (if thesis)

Sacks, Erik J

Committee Member(s)

Clark, Lindsay V
Ming, Ray
Lipka, Alexander E

Department of Study

Crop Sciences

Discipline

Bioinformatics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2021-03-05T21:36:48Z

Keyword(s)

bioinformatics
variant calling
polyploidy
genotyping-by-sequencing
GBS

Abstract

The development of genotyping-by-sequencing (GBS) methods has facilitated genomics studies in non-model species, including polyploids. Variant and genotype calling methods have been established for autopolyploids but for a species with a complex genome, such as sugarcane, the level of uncertainty within GBS data increases making trait mapping difficult. Furthermore, variant and genotype calling methods remain a challenge for both recent and ancient allopolyploids (e.g. wheat, maize, soybean, Miscanthus), particularly where the reference genome contains highly similar paralogous sequences that do not pair at meiosis. Alignment of sequence tags to the appropriate position within highly duplicated reference genomes remains a challenge inadequately addressed by existing alignment software. Although some variant calling pipelines can discriminate a paralogous locus from a Mendelian locus, the detection of these paralogous loci is typically for the purpose of the exclusion of these loci from the downstream analysis of genomic studies. We explore the significance of eliminating paralogous loci in downstream analysis using a newly developed pipeline developed to sort sequence tags to their correct alignment locations based on the novel Hind/HE statistic. The goal of this study was to evaluate the sorting pipeline’s ability to properly align paralogous loci to the correct position with respect to the reference genome. Three studies were conducted with a population of 400 individuals simulated based upon the Triticum aestivum, the reanalysis of a previously published genome-wide study of fusarium head blight in 273 wheat breeding lines, and the reanalysis of a previously published genome-wide study of traits associated with yield in a Miscanthus diversity panel. Results from the study suggested that the filtering of sequences using the Hind/HE statistic underlying polyRAD v1.2 may lead differences in the output of sequences. Further comparison of each output suggested that the output of the novel pipeline, polyRAD, was concentrated in gene-rich regions compared to other standard variant calling pipelines. From this study, we provide recommendations for future users of the polyRAD v1.2 variant calling pipeline. Overall we recommend that polyRAD v1.2 is more useful for populations of outcrossing species.

Graduation Semester

2020-12

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/109337

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Navigating through the uncertainty of genotyping-by-sequencing data in polyploids

Mays, Wittney Debora

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Crop Sciences

Log In