Withdraw
Loading…
Entropy-based machine learning algorithms applied to genomics and pattern recognition
Moon, Wooyoung
Loading…
Permalink
https://hdl.handle.net/2142/104838
Description
- Title
- Entropy-based machine learning algorithms applied to genomics and pattern recognition
- Author(s)
- Moon, Wooyoung
- Issue Date
- 2019-04-16
- Director of Research (if dissertation) or Advisor (if thesis)
- Song, Jun S.
- Doctoral Committee Chair(s)
- Dahmen, Karin
- Committee Member(s)
- Kuehn, Seppe
- Draper, Patrick
- Department of Study
- Physics
- Discipline
- Physics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Machine Learning, Decision Trees, Convolutional Filters, Genomics, Cancer, Entropy
- Abstract
- Transcription factors (TF) are proteins that interact with DNA to regulate the transcription of DNA to RNA and play key roles in both healthy and cancerous cells. Thus, gaining a deeper understanding of the biological factors underlying transcription factor (TF) binding specificity is important for understanding the mechanism of oncogenesis. As large, biological datasets become more readily available, machine learning (ML) algorithms have proven to make up an important and useful set of tools for cancer researchers. However, there remain many areas for potential improvements for these ML models, including a higher degree of model interpretability and overall accuracy. In this thesis, we present decision tree (DT) methods applied to DNA sequence analysis that result in highly interpretable and accurate predictions. We propose a boosted decision tree (BDT) model using the binary counts of important DNA motifs to predict the binding specificity of TFs belonging to the same protein family of binding similar DNA sequences. We then proceed to introduce a novel application of Convolutional Decision Trees (CDT) and demonstrate that this approach has distinct advantages over the BDT modeil while still accurately predicting the binding specificty of TFs. The CDT models are trained using the Cross Entropy (CE) optimization method, a Monte Carlo optimization method based on concepts from information theory related to statistical mechanics. We then further study the CDT model as a general pattern recognition and transfer learning technique and demonstrate that this approach can learn translationally invariant patterns that lead to high classification accuracy while remaining more interpretable and learning higher quality convolutional filters compared to convolutional neural networks (CNN).
- Graduation Semester
- 2019-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/104838
- Copyright and License Information
- Copyright 2019 Wooyoung Moon
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Physics
Dissertations in PhysicsManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…