Withdraw
Loading…
Density-Based Clustering of Information Networks by Substructure Optimization
Bortner, Dustin P.
Loading…
Permalink
https://hdl.handle.net/2142/15489
Description
- Title
- Density-Based Clustering of Information Networks by Substructure Optimization
- Author(s)
- Bortner, Dustin P.
- Issue Date
- 2010-05
- Keyword(s)
- data mining
- clustering
- graph mining
- information network
- Language
- en
- Abstract
- Information networks, such as biological or social networks, contain groups of related entities, which can be identified by clustering. Density-based clustering (DBC) differs from vertex-partitioning methods in that some vertices are classified as noise. This approach is useful in practice to classify groups of related entities within noisy networks. The baseline DBC method involves constructing a maximal spanning forest (MSF) and deleting edges having weights below a threshold, leaving the connected components as clusters. In large networks, the data may contain large scale variances in the noise density level, which causes the baseline method to perform poorly. We investigate whether clustering within substructures of the MSF can improve the result. We present a new set of similarity measures for weighted networks that allow the MSF to connect vertices that are not connected in the original network. We convert the MSF to a hierarchy and merge it bottom-up by optimizing an objective function to produce a flat clustering. We present two new objective functions based on density and irregularity measures. Experiments show that the new similarity measures help to normalize noise density levels and outperform traditional set-based similarity measures; that the proposed substructure optimization method improves over the baseline in networks with many classes; and that DBC outperforms vertex-partitioning methods for classification in noisy networks. The results of the irregularity objective are promising, as it scales with the number of vertices, whereas objectives such as density or modularity scale with the number of edges.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/15489
Owning Collections
Dissertations and Theses - Computer Science PRIMARY
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…