Data Mining via Support Vector Machines: Scalability, Applicability, and Interpretability
Yu, Hwan-Jo
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/81644
Description
Title
Data Mining via Support Vector Machines: Scalability, Applicability, and Interpretability
Author(s)
Yu, Hwan-Jo
Issue Date
2004
Doctoral Committee Chair(s)
Han, Jiawei
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Computer Science
Language
eng
Abstract
KDD (Knowledge Discovery and Data mining) has been extensively studied in the last decade as data is continuously increasing in size and complexity. This thesis introduces three practical data mining problems---(1) classifying with large data sets, (2) classifying without negative data (i.e., single-class classification), and (3) discovering discriminant feature combinations---and presents solutions that are based on a principled methodology, i.e., Support Vector Machines (SVMs), to produce higher quality results with less human intervention. We first address several challenges in adopting SVM technology to the practice of data mining: (1) scalability: SVMs are unscalable to data size while common data mining applications often involve millions or billions of data objects, (2) applicability: SVMs are limited to (semi-) supervised learning which is mostly applied to binary classification problems, and (3) interpretability: It is hard to interpret and extract knowledge from SVM models. We then propose three principled solutions, which address these challenges, for the problems of the large-scale classification, the single-class classification, and the discriminant feature combination discovery. The contributions of this thesis cover the applications of bioinformatics and text-and-Web mining as well as methodologies of data mining and machine learning.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.