Withdraw
Loading…
Integrating heterogeneous data into electronic medical record analysis
Huang, Edward W.
Loading…
Permalink
https://hdl.handle.net/2142/104778
Description
- Title
- Integrating heterogeneous data into electronic medical record analysis
- Author(s)
- Huang, Edward W.
- Issue Date
- 2019-04-16
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Doctoral Committee Chair(s)
- Zhai, ChengXiang
- Committee Member(s)
- Farnoud, Farzad
- Campbell, Roy H.
- Peng, Jian
- Sinha, Saurabh
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Electronic medical records
- data mining
- knowledge graph
- heterogeneous data
- Abstract
- Electronic medical records (EMRs) are the digital equivalent of paper records at a clinician's office. They contain patient information such as treatment and medical history, and have been shown to have a wide variety of benefits. However, EMRs typically contain a multitude of diverse data, including images, doctor notes, medical test results, and genomic data. This heterogeneity generates high dimensionality and data sparsity, which are two of the most prevalent culprits that exacerbate already difficult computational problems. Additionally, domain-specific characteristics, such as the existence of synonyms in the medical vocabulary, introduce ambiguity. This can further reduce the data mining potential of EMRs. This thesis is a systematic study that addresses these issues associated with EMRs. In particular, I utilized heterogeneous data sources that are typically incompatible, and then developed frameworks in which these data sources complement one another. As a result, these methods have the potential for direct clinical translation, paving the way for improving healthcare from a data-driven perspective. To improve a variety of downstream healthcare applications, such as patient subcategorization, survival analysis, and visualization, I used external networks of domain knowledge consisting of drug-symptom relationships, protein-protein interactions, and genetic information to enhance patient records. I found that this enhancement process increased the data mining capabilities as well as the interpretability of the EMRs. To improve EMR retrieval systems, I developed a query expansion method that frames symptoms and treatments as two different languages. I found that a topic modeling method that follows this dual-language framework yielded the highest performance. Lastly, I showed that due to pathological similarities, jointly studying Alzheimer's disease and Parkinson's disease resulted in higher computational power by effectively increasing the size of the training datasets. This allowed for the accurate prediction of the onset of dementia in both diseases. Each of these results can lay the groundwork for applications that have the potential to be implemented directly in clinical practice, improving the safety and quality of patient care.
- Graduation Semester
- 2019-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/104778
- Copyright and License Information
- Copyright 2019 Edward W Huang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…