Withdraw
Loading…
Improving gene trees without more data
Gupta, Ashu
Loading…
Permalink
https://hdl.handle.net/2142/90687
Description
- Title
- Improving gene trees without more data
- Author(s)
- Gupta, Ashu
- Issue Date
- 2016-04-28
- Director of Research (if dissertation) or Advisor (if thesis)
- Warnow, Tandy
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Gene trees
- Species trees
- Binning
- Multi-locus bootstrapping (MLBS)
- BestML
- Gene tree estimation
- Species tree estimation
- Low phylogenetic signal
- Abstract
- Species tree and gene tree estimation from sequence data are two steps in many biological analyses. Computational challenges and limited amount of data often make estimating highly accurate phylogenetic trees a difficult task. Moreover, gene alignments used to estimate trees on individual loci often have low phylogenetic signal (e.g., short alignment length), resulting in poorly estimated gene trees. Species tree estimation on the other hand is challenged by individual loci having different evolutionary histories caused by a biological phenomenon known as incomplete lineage sorting (ILS). In the presence of ILS, summary methods like MP-EST, ASTRAL2, and ASTRID are often used to estimate the species tree from gene trees. Summary methods operate by combining estimated gene trees and thus suffer in the presence of low phylogenetic signal. To tackle this problem the Statistical Binning and Weighted Statistical Binning pipelines were designed to improve gene tree estimation, which in turn can improve species tree estimation. Experimental studies of these pipelines revealed that they helped in improving gene tree and species tree estimation. However, these studies only tested the weighted statistical binning and statistical binning pipelines using multi-locus bootstrapping (MLBS) and not using BestML, where MLBS and BestML are different ways to run a phylogenetic pipeline. In this thesis, a novel phylogenetic pipeline named WSB+WQMC is proposed. This pipeline shares several design features with the weighted statistical binning pipeline (referred as WSB+CAML in this thesis) but has some other desirable properties. The WSB+WQMC pipeline is also shown to be statistically consistent under the GTR+MSC model when a slightly different version of WQMC is used. In this study WSB+WQMC was evaluated and compared with the WSB+CAML pipeline on various simulated datasets using BestML analysis. Most of the trends seen in MLBS analyses were also observed for WSB+WQMC and WSB+CAML in BestML analyses with some important differences. It is shown that WSB+WQMC substantially improved the accuracy of gene tree and species tree estimation using ASTRAL2 and ASTRID on most datasets having low, medium, and moderately high levels of ILS. Compared to WSB+CAML, it was found that WSB+WQMC computed less accurate gene trees and species trees in certain model conditions having low and medium levels of ILS. However, WSB+WQMC was found to be better and at least as accurate as WSB+CAML in computing gene trees and species trees on all datasets having moderately high and high ILS levels. WSB+WQMC is also shown to be better in estimating gene trees on certain medium and low ILS datasets. Thus, WSB+WQMC is a potential alternative to WSB+CAML for gene tree and species tree estimation in the presence of low phylogenetic signal.
- Graduation Semester
- 2016-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/90687
- Copyright and License Information
- Copyright 2016 Ashu Gupta
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…