Withdraw
Loading…
Statistical learning approaches for obtaining interpretable reduced representations of multimodal sequencing datasets
Leistico, Jacob R
Loading…
Permalink
https://hdl.handle.net/2142/115458
Description
- Title
- Statistical learning approaches for obtaining interpretable reduced representations of multimodal sequencing datasets
- Author(s)
- Leistico, Jacob R
- Issue Date
- 2022-04-19
- Director of Research (if dissertation) or Advisor (if thesis)
- Song, Jun S
- Doctoral Committee Chair(s)
- Gruebele, Martin
- Committee Member(s)
- Kahn, Yonatan
- Kuehn, Seppe
- Department of Study
- Physics
- Discipline
- Physics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Epigenetic
- multimodal
- tensor decomposition
- single-cell
- Abstract
- The highly specialized functions of cells in a multicellular organism are effected by the proteins contained within and produced by the cell. Establishing the distinct protein composition of specialized cell types is accomplished largely through the regulation of gene transcription. Several epigenetic mechanisms, including chemical modifications of histone proteins, collectively function to establish and maintain the transcriptional state necessary for the specialized functions of the cell. The collection of RNA transcript abundance, protein content, and epigenetic features create a multimodal description of cell identity. Advances in high throughput sequencing technologies are increasingly enabling the profiling of multiple cellular modalities in bulk tissues and single cells. Computational algorithms that can integrate information from the multiple profiled modalities are needed to fully utilize the resulting information in classifying tissue or cell types and identifying the salient molecular features. This thesis presents rigorous computational methods for integrating multimodal genomic data to extract biologically relevant features that can help identify tissue or cell types. These methods share the common approach of obtaining reduced representations of the multimodal sequencing datasets that can be used for downstream analysis. The first method applies a higher order singular value decomposition (HOSVD) to decompose an epigenetic data tensor obtained from profiling multiple histone modifications in human tissue samples. The reduced representations obtained with this method are shown to capture features differentiating disease conditions, mutational subtypes, and tissue types. Additionally these representations are connected to covarying profiles in genomic location space that are shown to identify recurrent epigenetic differences between disease conditions. The second method applies the spectral clustering on multilayer graphs (SCML) algorithm to cluster cells from a multimodal single-cell sequencing dataset that simultaneously profiles mRNA transcripts and selected surface proteins. Clusters obtained with the SCML algorithm are shown to compromise between the community structure obtained from the mRNA transcripts and the protein surface markers. The results presented in this thesis demonstrate the benefits of integrating multiple molecular modalities in identifying cell and tissue types.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Jacob Leistico
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…