IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Utilizing multiple entities from collection of unstructured documents in constructing attribute-value pairs

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/34506

Files in this item

File Description Format
PDF Cho_Hyun Duk.pdf (2MB) Restricted to U of Illinois (no description provided) PDF
Title: Utilizing multiple entities from collection of unstructured documents in constructing attribute-value pairs
Author(s): Cho, Hyun Duk
Advisor(s): Zhai, Chengxiang
Department / Program: Computer Science
Discipline: Computer Science
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree: M.S.
Genre: Thesis
Subject(s): attribute extraction (attribute-value pair) nvp value extraction evaluation
Abstract: Attribute-value pairs, or NVP is defined as extracting words expressing characteristics of entity and associating the said words with word or phrases that best describe the attributes. Applications for NVP arise in various related area such as sentiment analysis, populating and checking for errors in relational database to a broader text information area such as QA systems, search and review modeling. We propose an unsupervised method to identify the properties of entities represented as NVP from unstructured documents. Other approaches that extract NVP usually uti- lize supervised or semi-supervised approaches on structured or semi-structured documents. Benefits of such approaches lie in that they tend to have higher accuracy than unsuper- vised approaches on unstructured documents. Furthermore, supervised approaches are more suited to distinguishing attribute words to that of value words than unsupervised approaches on unstructured documents. The biggest drawback with the said methods however, is that training data may not always be available and not all documents can be thought of as being unstructured. We first proposes in this thesis an approach to extracting and distinguishing attribute words and value words from unstructured documents. Since entities of the same class share similar attributes, we propose that the identification of relevant attributes should be done across entities belonging to the same class, and demonstrate that this can lead to a significant performance gain in attribute extraction, even when only documents describing a modest number of entities per class is available. We then propose a way to evaluate the accuracy of attribute-value pairs automatically, allowing for quantitative comparison between different systems that is more consistent and cost-effective than manual evaluations. These were used in evaluating summarization or comparing ontologies. However, these techniques have not been utilized in evaluating NVP. Both the automated and manual evaluations show that our system outperforms a comparison system.
Issue Date: 2012-09-18
URI: http://hdl.handle.net/2142/34506
Rights Information: Copyright 2012 Hyun Duk Cho
Date Available in IDEALS: 2012-09-18
Date Deposited: 2012-08
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 15
  • Downloads this Month: 0
  • Downloads Today: 0

Browse

My Account

Information

Access Key