Withdraw
Loading…
Contextual Indexing and Joining: Supporting Efficient, Scalable Entity Search
Cheng, Tao; Chang, Kevin Chen-Chuan
Loading…
Permalink
https://hdl.handle.net/2142/11405
Description
- Title
- Contextual Indexing and Joining: Supporting Efficient, Scalable Entity Search
- Author(s)
- Cheng, Tao
- Chang, Kevin Chen-Chuan
- Issue Date
- 2007-10
- Keyword(s)
- computer science
- Abstract
- As the Web has evolved into an entity abundant repository, with the standard ``page view'', current search engines are becoming increasingly inadequate for a wide range of query tasks. Entity search, a significant departure from document retrieval, finds fine granularity information, i.e, entities, embedded in documents directly and holistically across the whole collection. Essentially, entity search is to find matching entities by context patterns from each document and to aggregate them across documents for ranking. This text-based pattern matching suggests that standard inverted lists-based query processing can be applied. However, this baseline is limited in both efficiency, due to long entity lists, and scalability, due to cross-document aggregation. To enhance efficiency, we propose ``contextual index'', an index that materializes pre-joins, to eliminate unnecessary index reading and reduce online matching. To improve scalability, we propose ``entity-space'' partitioning, so that answer subspaces can be aggregated locally. We reason our design rationale from both the functional and the operational definition of entity search, and show that they consistently reach our framework. We evaluate the indexing (contextual indexing) and parallel query processing (contextual joining) framework over a 2TB real Web corpus with systematic benchmark query sets. Experiments show that our scheme can speed up query processing by, in average, two order of magnitude over the baseline.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/11405
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…