Withdraw
Loading…
Supertree-like methods for genome-scale species tree estimation
Molloy, Erin Katherine
Loading…
Permalink
https://hdl.handle.net/2142/108428
Description
- Title
- Supertree-like methods for genome-scale species tree estimation
- Author(s)
- Molloy, Erin Katherine
- Issue Date
- 2020-06-29
- Director of Research (if dissertation) or Advisor (if thesis)
- Warnow, Tandy
- Gropp, William
- Doctoral Committee Chair(s)
- Warnow, Tandy
- Committee Member(s)
- Snir, Marc
- Nakhleh, Luay
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- phylogenetics
- phylogenomics
- species tree estimation
- Multi-Species Coalescent model
- gene duplication and loss
- Abstract
- A critical step in many biological studies is the estimation of evolutionary trees (phylogenies) from genomic data. Of particular interest is the species tree, which illustrates how a set of species evolved from a common ancestor. While species trees were previously estimated from a few regions of the genome (genes), it is now widely recognized that biological processes can cause the evolutionary histories of individual genes to differ from each other and from the species tree. This heterogeneity across the genome is phylogenetic signal that can be leveraged to estimate species evolution with greater accuracy. Hence, species tree estimation is expected to be greatly aided by current large-scale sequencing efforts, including the 5000 Insect Genomes Project, the 10000 Plant Genomes Project, the (~60000) Vertebrate Genomes Project, and the Earth BioGenome Project, which aims to assemble genomes (or at least genome-scale data) for 1.5 million eukaryotic species in the next ten years. To analyze these forthcoming datasets, species tree estimation methods must scale to thousands of species and tens of thousands of genes; however, many of the current leading methods, which are heuristics for NP-hard optimization problems, can be prohibitively expensive on datasets of this size. In this dissertation, we argue that new methods are needed to enable scalable and statistically rigorous species tree estimation pipelines; we then seek to address this challenge through the introduction of three supertree-like methods: NJMerge, TreeMerge, and FastMulRFS. For these methods, we present theoretical results (worst-case running time analyses and proofs of statistical consistency) as well as empirical results on simulated datasets (and a fungal dataset for FastMulRFS). Overall, these methods enable statistically consistent species tree estimation pipelines that achieve comparable accuracy to the dominant optimization-based approaches while dramatically reducing running time.
- Graduation Semester
- 2020-08
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/108428
- Copyright and License Information
- Copyright 2020 Erin Molloy
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…