Machine Learning for Information Extraction

Zelenko, Dmitry

Machine Learning for Information Extraction

Zelenko, Dmitry

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/81632

Description

Title: Machine Learning for Information Extraction
Author(s): Zelenko, Dmitry
Issue Date: 2003
Doctoral Committee Chair(s): Roth, Dan
Department of Study: Computer Science
Discipline: Computer Science
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree Name: Ph.D.
Degree Level: Dissertation
Keyword(s): Artificial Intelligence
Language: eng
Abstract: The dissertation presents a number of novel machine learning techniques and applies them to information extraction. The study addresses several information extraction subtasks: part of speech tagging, entity extraction, coreference resolution, and relation extraction. Each of the tasks is formalized as a learning problem and appropriate learning algorithms are developed and applied to the problem. The dissertation studies part of speech tagging as a multi-class classification problem, and applies the SNOW (Sparse Network of Winnows) learning system to learn a part of speech classifier. A comprehensive experimental evaluation of the system confirms that it is appropriate for NLP applications. The dissertation addresses the problem of entity extraction is conjunction with coreference resolution. A classification approach is presented for entity extraction, and coreference resolution is treated from the decoding perspective. The dissertation describes novel decoding algorithms that given local coreference decisions produce a global coherent interpretation of document entities. The dissertation studies the problem of relation extraction as a classification problem, and applies kernel methods to learn the relation classifiers. Novel kernels are defined in terms of shallow parses, and efficient algorithms are given for computing the kernels. The study evaluates the kernel approach experimentally, with positive results. The dissertation combines the constituent solutions to present a single coherent information extraction system and concludes that machine learning is a viable methodology for designing natural language processing applications.
Graduation Semester: 2003
Type of Resource: text
Permalink: http://hdl.handle.net/2142/81632

Machine Learning for Information Extraction

Zelenko, Dmitry

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Machine Learning for Information Extraction

Zelenko, Dmitry

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In