Withdraw
Loading…
Semantic pattern discovery in open information extraction
Chauhan, Aabhas
Loading…
Permalink
https://hdl.handle.net/2142/108194
Description
- Title
- Semantic pattern discovery in open information extraction
- Author(s)
- Chauhan, Aabhas
- Issue Date
- 2020-05-13
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Information Extraction
- Pattern Mining
- Abstract
- Open information extraction (OpenIE) is a novel paradigm that produces structured information from unstructured text with minimum or no supervision. The task involves extracting relevant relation tuples or expressions from a text corpus. Existing methods in the domain tend to produce a large percentage of ill-structured, incomplete or redundant extractions which cannot be directly used in downstream applications, and often fail on sentences with long and complex structures. In this paper, we propose a novel semantic pattern-discovery for OpenIE (SemPatIE) framework which extracts relations in the form of typed textual pattern structures, called meta patterns and groups semantically similar pattern structures. To perform these tasks, the framework uses three techniques: (1) it simplifies complex sentence structures by performing a context-aware sentence segmentation method which splits the dependency graph of sentences at noun or verb level and enables pattern extraction between distantly placed entities; (2) it extracts meta patterns and handles its pattern sparsity problem by introducing a novel idea of iterative frequent pattern mining and nested push-ups; (3) it generates semantic pattern clusters by embedding a multi text-based network between entities, entity types, extracted meta patterns and context words. Experiments show SemPatIE outperforms state-of-the-art OpenIE baselines in handling structurally complex sentences and has a significantly higher recall than existing pattern-based methods. Case studies exhibit the framework's high generalization ability and scalabilty, and effective clustering performance which has direct applications in downstream tasks like knowledge graph construction, evidence mining and truth finding.
- Graduation Semester
- 2020-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/108194
- Copyright and License Information
- Copyright 2020 Aabhas Chauhan
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…