Joint classification and information extraction framework
Rameshkumar, Revanth
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/91561
Description
Title
Joint classification and information extraction framework
Author(s)
Rameshkumar, Revanth
Contributor(s)
Nahrstedt, Klara
Issue Date
2016-05
Keyword(s)
natural language processing
machine learning models
joint models
text classification
information extraction
Abstract
This thesis proposes a joint Information-Extraction and Classification model for document analysis in domain specific text. Existing information extraction (IE) systems typically try to extract key value pairs or target phrases by learning from user-provided examples or depend on a strong named-entity tagger, as in the Snowball information extraction system. Others, while not depending on user provided IE patterns, end up depending on part of speech, syntactic, or semantic tagged data to extract target phrases; or depend on heavily annotated text to build a learning dictionary. The disadvantage with this is that it takes many man-hours to build a usable training dataset. This is especially disadvantageous when the cost of assigning a domain expert to tasks like tagging and annotating is too great to be practical. This thesis describes a prototype system RICE (Rev’s Iterative Classifier Extractor) that is able to extract information from domain specific text using only a set of labeled (domain relevant or domain irrelevant) documents. The system is trained using only labeled documents and outputs a set of relevant phrases, an Information Extraction Pattern ranker model, and a usable document classifier. An iterative approach is used where extracted noun phrases are used to both simultaneously train a classifier and build a ranked IE Pattern list. The results show that the joint classification and IE model approach definitely works and produces results that are greater enough than chance that the model is worth further pursuit. In fact, it definitely has the potential to be used in production systems. However, there is quite a bit of work that needs to be done to eliminate noise and increase precision. We also discuss next steps, improvements, applications, and future works at the end of the thesis.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.