Withdraw
Loading…
Documents representation for comparable corpora clustering: A preliminary study
Ma, Shutian; Zhang, Chengzhi
Loading…
Permalink
https://hdl.handle.net/2142/96740
Description
- Title
- Documents representation for comparable corpora clustering: A preliminary study
- Author(s)
- Ma, Shutian
- Zhang, Chengzhi
- Issue Date
- 2017
- Keyword(s)
- Comparable corpora clustering
- Document representation method
- Abstract
- With increasing globalization, digital libraries tend to provide multilingual documents access. There have been lots of available text information covering the same or similar topic written in multiple languages, namely comparable corpora. To better organize such information with clustering technique, we have explored three document representation methods, Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA) and Doc2Vec (D2V) in task of comparable corpora clustering before. Previously used comparable corpora are in small size of hundred magnitude. In this poster, we use the comparable corpora of regular amount. Methods are found to perform differently when representing dimension sizes are different. Clustering results are investigated according to different representation methods. Choices of the best method for comparable corpora clustering are also discussed.
- Publisher
- iSchools
- Series/Report Name or Number
- iConference 2017 Proceedings
- Type of Resource
- text
- Language
- en
- Permalink
- http://hdl.handle.net/2142/96740
- Copyright and License Information
- Copyright 2017 Shutian Ma and Chengzhi Zhang
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…