Bicriterion Clustering and Selecting the Optimal Number of Clusters via Agreement Measure
Liu, Heng
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/87410
Description
Title
Bicriterion Clustering and Selecting the Optimal Number of Clusters via Agreement Measure
Author(s)
Liu, Heng
Issue Date
2007
Doctoral Committee Chair(s)
Douglas Simpson
Department of Study
Statistics
Discipline
Statistics
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Statistics
Language
eng
Abstract
Clustering and classification have been important tools to address a broad range of problems in fields such as image analysis, genomics, and many other areas. Basically, these clustering problems can be simplified as two aspects. The first is to estimate the number of clusters. The second one is to allocate each observation to the clusters. Many different heuristic criteria are available. The representative models are k-means, hierarchical clustering and partitioning around medoids. Among these methods, there exists the problem to select the number of clusters. In addition, some algorithms make use of a starting allocation of the observations, such as k-means, which may contain the inherent bias. Often the data partitioning will suffer lack of consistency across different criteria and algorithms. In this thesis, we propose an approach to select the number of clusters through comparing and optimizing the agreement between two clustering criteria. The intuition is that the clustering randomness from different criteria should be minimized when the true clustering structure is recovered. By maximizing the agreement on allocation of the observations between different methods, it selects the optimal number of clusters and also results in a robust consensus set of clusters. Furthermore we use a number of classification rules to combine the resultant clusters from two algorithms. The favorable performance of the method is demonstrated in simulation studies and fMRI time series application. Finally the asymptotic properties of the agreement statistics are discussed.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.