Withdraw
Loading…
Document expansion and language model re-estimation for information retrieval
Sherman, Garrick
Loading…
Permalink
https://hdl.handle.net/2142/106203
Description
- Title
- Document expansion and language model re-estimation for information retrieval
- Author(s)
- Sherman, Garrick
- Issue Date
- 2019-11-25
- Director of Research (if dissertation) or Advisor (if thesis)
- Diesner, Jana
- Doctoral Committee Chair(s)
- Diesner, Jana
- Committee Member(s)
- Downie, J. Stephen
- Underwood, Ted
- Arguello, Jaime
- Department of Study
- Information Sciences
- Discipline
- Library & Information Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- information retrieval
- document expansion
- language models
- Abstract
- Document expansion is the process of augmenting the text of a document with text drawn from one or more other documents. The purpose of this expansion is to increase the size of the term sample from which document representations, such as language models, may be estimated. While document expansion has been shown to improve the effectiveness of ad-hoc document retrieval, our work differs from previous work in a variety of ways. We propose a consistent language modeling approach to document expansion of full length documents. We also explore the use of one or more external document collections as sources of data during the expansion process. Our proposed methods prove successful in improving retrieval effectiveness over baselines. We also acknowledge that existing document expansion work, including our own, has relied on intuitive assumptions about the mechanisms by which it achieves its effects. In this thesis, we quantify aspects of document language model change resulting from expansion. We investigate the relationships between these changes and the operations of our model. In doing so, we establish evidence to support prior intuitions; specifically, we find relationships between the quality of a document's representation, which is used to identify appropriate expansion documents, and the expansion model's success in accurately re-estimating a language model. Finally, recognizing the potential for further retrieval effectiveness improvement by means of selective application of our model, we investigate methods for automatically predicting whether or not to expand individual documents and, if so, which expansion collection may yield the optimal document representation. We find that, although the document expansion retrieval model has proven effective overall, accurate prediction concerning the expansion of a given document depends too heavily on predicting the document's relevance. These findings suggest limitations to any model that may seek to optimize scoring on a per-document basis.
- Graduation Semester
- 2019-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/106203
- Copyright and License Information
- Copyright 2019 Garrick Sherman
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Information Sciences
Dissertations and theses from the School of Information SciencesManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…