Withdraw
Loading…
Methods to summarize and reduce the solution space of tumor phylogeny inference
Aguse, Nuraini Binti
Loading…
Permalink
https://hdl.handle.net/2142/108034
Description
- Title
- Methods to summarize and reduce the solution space of tumor phylogeny inference
- Author(s)
- Aguse, Nuraini Binti
- Issue Date
- 2020-05-12
- Director of Research (if dissertation) or Advisor (if thesis)
- El-Kebir, Mohammed
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Tumor Phylogeny
- Summary
- Single-cell sequencing
- Abstract
- Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analysis, one can either summarize the set of cancer phylogenies or use additional data to eliminate trees and further reduce the solution space. Current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. On the other hand, while single-cell sequencing (SCS) provides the data that we need to reduce solution space, it may become prohibitively costly as the number of cells to sequence increases. In this thesis, we first introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster the trees in the solution space and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP) and a heuristic algorithm that efficiently identifies high-quality consensus trees. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space. Next, we introduce PhyDOSE, a method that uses bulk sequencing data to strategically optimize the design of follow-up single-cell sequencing experiments. We incorporate distinguishing features - features that uniquely identify a tree - into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a childhood leukemia patient, concluding that PhyDOSE's computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential of significant reduction in the number of cells to sequence. In a prospective analysis, we demonstrate that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. Finally, we provide an R package and web interface for the ease of use of PhyDOSE.
- Graduation Semester
- 2020-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/108034
- Copyright and License Information
- Copyright 2020 Nuraini Aguse
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…