Withdraw
Loading…
Scalable centralized and distributed spectral clustering
Bhatti, Shahzad Fazal
Loading…
Permalink
https://hdl.handle.net/2142/101769
Description
- Title
- Scalable centralized and distributed spectral clustering
- Author(s)
- Bhatti, Shahzad Fazal
- Issue Date
- 2018-06-26
- Director of Research (if dissertation) or Advisor (if thesis)
- Beck, Carolyn L.
- Doctoral Committee Chair(s)
- Beck, Carolyn L.
- Committee Member(s)
- Srikant, Rayadurgam
- Salapaka, Srinivasa M.
- Marla, Lavanya
- Department of Study
- Industrial&Enterprise Sys Eng
- Discipline
- Industrial Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Clustering algorithms
- community detection
- graph partitioning
- random walk
- distributed algorithms
- Abstract
- Spectral clustering approaches have led to well-accepted algorithms for finding accurate clusters in a given dataset. However, their application to large-scale datasets has been hindered by the computational complexity of eigenvalue decompositions. Several algorithms have been proposed in the recent past to accelerate spectral clustering, however, they compromise on the accuracy of the spectral clustering to achieve faster speed. In this paper, we propose a novel spectral clustering algorithm based on a mixing process on a graph. Unlike the existing spectral clustering algorithms, our algorithm does not require computing eigenvectors. Specifically, it finds the equivalent of a linear combination of eigenvectors of the normalized similarity matrix weighted with corresponding eigenvalues. This linear combination is then used to partition the dataset into meaningful clusters. Simulations on real datasets show that partitioning datasets based on such linear combinations of eigenvectors achieve better accuracy than standard spectral clustering methods as the number of clusters increase. Our algorithm can easily be implemented for parallel processing. In the past few years, the size of a typical dataset has grown exponentially making it impossible to the store the data in a single system. Thus distributed systems are employed to store the data. Most of the clustering algorithms are tailored towards data stored in a centralized system which makes them inappropriate for the distributed system. Moreover, the large scale of the data prohibits us from moving it to a central location to use a centralized algorithm. Our approach to distributed spectral clustering works in two phases. In phase 1, individual machines generate a set of representative points of the local data and communicate it to a central machine. In phase 2, the central machine performs spectral clustering on the data and communicates the cluster assignment of the representative points to the corresponding nodes. We have explored various algorithms to generate the representative points and compare their trade-offs and accuracy. Our algorithm can easily be cast in the MapReduce framework.
- Graduation Semester
- 2018-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/101769
- Copyright and License Information
- Copyright 2018 Shahzad Bhatti
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…