Withdraw
Loading…
GeneSet MAPR: Characterization of gene sets through heterogeneous network patterns
Linkowski, Gregory
Content Files

Loading…
Download Files
Loading…
Download Counts (All Files)
Loading…
Edit File
Loading…
Permalink
https://hdl.handle.net/2142/101057
Description
- Title
- GeneSet MAPR: Characterization of gene sets through heterogeneous network patterns
- Author(s)
- Linkowski, Gregory
- Issue Date
- 2018-04-24
- Director of Research (if dissertation) or Advisor (if thesis)
- Vasudevan, Shobha
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Date of Ingest
- 2018-09-04T20:27:26Z
- Keyword(s)
- graph theory
- network
- meta-paths
- bioinformatics
- machine learning
- pattern recognition
- big data
- statistical analysis
- p-value
- Abstract
- Often, machine learning and big data concepts are applied to problems without a proper appreciation of their limitations or domain context. At the same time there is a growing appreciation for the ability of networks to represent more complex connections between data points than previous structures. However, established machine learning approaches rarely take advantage of such structures and must be adapted. We present here a method that utilizes patterns of connections within heterogeneous networks to score items by their similarity to an input set. We apply the idea of meta-paths as an abstraction to counteract typical big data problems of noise and overfitting. We also aim to demystify the black-box nature of machine learning by providing intuitive feedback about why items are considered similar. While the method presented here is generalizable to any domain, the specific examples explored are within the genomics domain. The final tool, GeneSet MAPR, is especially useful in a domain with little ground truth and a huge volume of noisy, uncertain data. We show that GeneSet MAPR performs better at discovering related but concealed data points than an approach using the same data without abstraction, as well as a an established state-of-the-art approach that works on a network but ignores the heterogeneous patterns. It does this while providing details the other methods cannot.
- Graduation Semester
- 2018-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/101057
- Copyright and License Information
- Copyright 2018 Gregory Linkowski
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…