A study of the impact of global statistics in distributed information retrieval
Sehrawat, Nipun
Loading…
Permalink
https://hdl.handle.net/2142/31920
Description
Title
A study of the impact of global statistics in distributed information retrieval
Author(s)
Sehrawat, Nipun
Issue Date
2012-06-27T21:19:22Z
Director of Research (if dissertation) or Advisor (if thesis)
Zhai, ChengXiang
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Distributed Information Retrieval
Global Statistics
Retrieval Performance
Abstract
Today’s information retrieval systems have to deal with very large data collections and take a distributed approach to achieve scalable retrieval performance. The most widely used approach, called document-partitioning, is to partition the data among multiple search-nodes, which then index their sub-collection independently and are responsible for scoring documents present in their index, against queries. Most of the famous document scoring functions depend on various global (collection-wide) statistics such as document frequency of terms. However, as search-nodes don’t have access to global-statistics and rely on local (sub-collection-wide) statistics for the purpose of scoring, document-partitioning can result in a degraded retrieval performance. In this thesis, we study the impact of the lack of global-statistics on the retrieval performance of a distributed information retrieval (DIR) system. Our experiments show that the performance, as indicated by multiple measures, degrades as the number of search-nodes are increased. We thus conclude that global-statistics are essential to the retrieval performance in a distributed setup. Finally, we present a novel scheme for lazy and adaptive dissemination of global-statistics in a document-partitioned DIR system.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.