Withdraw
Loading…
Understanding information at the biomolecular level using statistics and machine learning
Luu, Alan M.
Loading…
Permalink
https://hdl.handle.net/2142/115452
Description
- Title
- Understanding information at the biomolecular level using statistics and machine learning
- Author(s)
- Luu, Alan M.
- Issue Date
- 2022-04-14
- Director of Research (if dissertation) or Advisor (if thesis)
- Song, Jun
- Doctoral Committee Chair(s)
- Maslov, Sergei
- Committee Member(s)
- Golding, Ido
- Perez-Pinera, Pablo
- Department of Study
- Physics
- Discipline
- Physics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- biology
- statistics
- machine learning
- genomics
- Central Dogma
- DNA
- RNA
- protein
- information
- genome editing
- CRISPR
- next-generation sequencing
- deep learning
- genetic engineering
- keratinocyte
- cancer
- immune system
- CRISPR base editor
- squamous cell carcinoma
- basal cell carcinoma
- RNA-Seq
- scRNA-Seq
- convolutional neural network
- deep metric learning
- multimodal learning
- T-Cell receptors
- neural network interpretation
- epitope
- MCMC
- Abstract
- The Central Dogma of molecular biology states that DNA is transcribed into RNA, which is then translated into protein. The majority of cellular functions can trace their origins to the various stages of the Central Dogma, making it a central pillar of our understanding of biological systems. However, this description of molecular biology only touches the surface of understanding biological function; the information encoded in these biomolecules and the way this information is processed is also crucial to understanding biological phenomena. With the recent development of genome editing tools and next-generation sequencing methods, researchers finally possess the necessary means to measure and even control information on the genomic, transcriptomic, and proteomic levels. Furthermore, the accumulation of large datasets with new modalities of data have opened up opportunities to develop new methods of biological data analysis based on machine learning. This thesis documents our efforts to develop and utilize statistical and machine learning techniques to analyze genetic engineering techniques and leverage data from novel next-generation sequencing assays to shed new light on previously studied biological phenomena, including the origin of keratinocyte cancers and immune system recognition of pathogens. First, we investigated the use of genome editing tools to engineer transcription. We performed statistical analysis on RNA sequence data and genomic DNA sequence data to show successful exclusion of exons from RNA transcripts after modifying the splice signal using CRISPR base editors as well as quantify base editing rates of DNA on-target and off-target sites. Additionally, we performed genome-wide prediction of editability of exons and developed a web interface to facilitate use of the technology. Second, we investigated the cell of origin of two cancers, Squamous Cell Carcinoma and Basal Cell Carcinoma, by developing a similarity metric between bulk RNA expression levels of cancer and single-cell RNA sequence data of keratinocytes in various stages of their differentiation process. Third, we used a convolutional neural network model inspired by concepts from deep metric learning and multimodal learning to predict binding between T-Cell receptors (TCRs) and antigen epitopes with accuracy comparable to, or greater than, the state-of-the-art. We used a neural network interpretation method to identify positions in the TCR important for binding and utilized crystal structure data to show that proximity of the TCR to the epitope may not be a good proxy of importance in determining epitope specificity.
- Graduation Semester
- 2022-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Alan Luu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…