A Framework for Knowledge Discovery from Sparse, High-Dimensional Medical Datasets

Ramachandran, Chandrasekar

A Framework for Knowledge Discovery from Sparse, High-Dimensional Medical Datasets

Ramachandran, Chandrasekar

Content Files

Ramachandran_Chandrasekar.pdf

Permalink

https://hdl.handle.net/2142/14710

Description

Title

A Framework for Knowledge Discovery from Sparse, High-Dimensional Medical Datasets

Author(s)

Ramachandran, Chandrasekar

Issue Date

2010-01-06T16:40:06Z

Director of Research (if dissertation) or Advisor (if thesis)

Han, Jiawei

Doctoral Committee Chair(s)

Han, Jiawei

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2010-01-06T16:40:06Z

Keyword(s)

islet cell transplants
medical data mining
dimensionality reduction
association rule mining

Abstract

In this work, we describe a comprehensive framework for knowledge discovery from medical records called SDM-Miner. The records are created before, during and after pancreatic islet cell transplantation1 on a group of diabetic patients. The knowledge discovery focuses on selecting the most relevant variables for predicting the outcome of islet cell transplants temporally, and supporting the medical understanding of the variable relationships that would lead to insulin-free outcome of a transplant with machine learning models. The challenges of knowledge discovery lie in the temporally sparse nature of medical records and the large number of variables which make the traditional statistical analyses ineffective. Our approach to overcome the challenges is to combine data-driven computationally intensive modeling with statistical modeling. The framework incorporates this approach during three phases of knowledge discovery including (1) statistical data-preprocessing, (2) pattern search based dimensionality reduction, and (3) association rule based and conditional probability based data-driven modeling. We evaluate the framework by cross validating the models (of machine learning) using prediction errors and uncertainty of rule discovery. In order to demonstrate the novelty of the framework and the improved performance in knowledge discovery, we report results using real and synthetic datasets. Experimental results on synthetic data act as a sanity check in order to verify the effectiveness of our models in the absence of standard test results. The evaluation results show that our framework led to smaller mean error with the decreasing number of variable samples, higher robustness to Gaussian noise, and higher confidence and support of association rules than the previous methods. Furthermore, we evaluate our proposed technique using existing machine learning algorithms using the Weka toolkit and show the improved performance of our work as compared to previous approaches.

Graduation Semester

2009-12

Permalink

http://hdl.handle.net/2142/14710

Copyright and License Information

Owning Collections

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

A Framework for Knowledge Discovery from Sparse, High-Dimensional Medical Datasets

Ramachandran, Chandrasekar

Permalink

Description

Owning Collections

Dissertations and Theses - Computer Science

Graduate Dissertations and Theses at Illinois PRIMARY

Log In