Withdraw
Loading…
MEDIATE: Learning to Match Entity Mentions across Text and Databases
Doan, AnHai; Li, Xin; Roth, Dan
Loading…
Permalink
https://hdl.handle.net/2142/11216
Description
- Title
- MEDIATE: Learning to Match Entity Mentions across Text and Databases
- Author(s)
- Doan, AnHai
- Li, Xin
- Roth, Dan
- Issue Date
- 2006-02
- Keyword(s)
- computer science
- Abstract
- Many real-world applications increasingly involve both structured data and text. A given real-world entity is often referred to in different ways, such as ``Helen Hunt'', and ``Mrs. H. E. Hunt'', both within and across the structured data and the text. Due to this {\em semantic heterogeneity}, it remains extremely difficult to glue together information about real-world entities from the available data sources and effectively utilize both types of information. This paper describes the \mediate\ system which automatically matches entity mentions {\em within\/} and {\em across\/} both text and databases. The system can handle multiple types of entities (e.g., people, movies, locations), is easily extensible to new entity types, and operates with no need for annotated training data. Given a relational database and a set of text documents, \mediate\ learns from the data a {\em generative model\/} that provides a probabilistic view on how a data creator might have generated mentions, then applies it to matching the mentions. The model exploits the similarity of mention names, common transformations across mentions, and context information such as age, gender, and entity co-occurrence. To maximize matching accuracy, \mediate\ also propagates information across contexts. Experiments on real-world data show that \mediate\ significantly outperforms existing methods that address aspects of this problem, and that it can exploit text to improve record linkage, and vice versa.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/11216
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…