Withdraw
Loading…
Utilizing multiple entities from collection of unstructured documents in constructing attribute-value pairs
Cho, Hyun Duk
Loading…
Permalink
https://hdl.handle.net/2142/34506
Description
- Title
- Utilizing multiple entities from collection of unstructured documents in constructing attribute-value pairs
- Author(s)
- Cho, Hyun Duk
- Issue Date
- 2012-09-18T21:20:36Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhai, ChengXiang
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- attribute extraction
- (attribute-value pair) nvp
- value extraction
- evaluation
- Abstract
- Attribute-value pairs, or NVP is defined as extracting words expressing characteristics of entity and associating the said words with word or phrases that best describe the attributes. Applications for NVP arise in various related area such as sentiment analysis, populating and checking for errors in relational database to a broader text information area such as QA systems, search and review modeling. We propose an unsupervised method to identify the properties of entities represented as NVP from unstructured documents. Other approaches that extract NVP usually uti- lize supervised or semi-supervised approaches on structured or semi-structured documents. Benefits of such approaches lie in that they tend to have higher accuracy than unsuper- vised approaches on unstructured documents. Furthermore, supervised approaches are more suited to distinguishing attribute words to that of value words than unsupervised approaches on unstructured documents. The biggest drawback with the said methods however, is that training data may not always be available and not all documents can be thought of as being unstructured. We first proposes in this thesis an approach to extracting and distinguishing attribute words and value words from unstructured documents. Since entities of the same class share similar attributes, we propose that the identification of relevant attributes should be done across entities belonging to the same class, and demonstrate that this can lead to a significant performance gain in attribute extraction, even when only documents describing a modest number of entities per class is available. We then propose a way to evaluate the accuracy of attribute-value pairs automatically, allowing for quantitative comparison between different systems that is more consistent and cost-effective than manual evaluations. These were used in evaluating summarization or comparing ontologies. However, these techniques have not been utilized in evaluating NVP. Both the automated and manual evaluations show that our system outperforms a comparison system.
- Graduation Semester
- 2012-08
- Permalink
- http://hdl.handle.net/2142/34506
- Copyright and License Information
- Copyright 2012 Hyun Duk Cho
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…