Withdraw
Loading…
Entity Retrieval over Structured Data
Fang, Hui; Sinha, Rishi R.; Wu, Wensheng; Doan, AnHai; Zhai, ChengXiang
Loading…
Permalink
https://hdl.handle.net/2142/11135
Description
- Title
- Entity Retrieval over Structured Data
- Author(s)
- Fang, Hui
- Sinha, Rishi R.
- Wu, Wensheng
- Doan, AnHai
- Zhai, ChengXiang
- Issue Date
- 2005-12
- Keyword(s)
- entity retrieval
- data structures
- Abstract
- "Entity retrieval is the problem of finding information about a given real-world entity (e.g., director Peter Jackson) from one or a set of data sources. This problem is fundamental in numerous data management settings, but has received little attention. We define the general entity retrieval problem, then discuss the limitations of current information systems (e.g. relational databases, search engines) in solving it. Next, we focus on the specific problem of entity retrieval over structured data (as opposed to text or Web pages). We show that it is inherently more general and difficult than the actively-studied problem of entity matching (i.e. record linkage). We then develop the ENRICH system, which significantly extends entity matching solutions to perform entity retrieval. In particular, ENRICH employs clustering techniques to obtain a global picture on how many entities are ""out there"" and which data fragment should best be assigned to which entity. It also constructs profiles that capture important characteristics of the target entity, then uses the profiles to help the assignment process. Finally, it leverages ""query expansion"", an idea commonly used in the information retrieval community, to further improve retrieval accuracy. We apply ENRICH to several real-world domain, and show that it can perform entity retrieval with high accuracy."
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/11135
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…