Targeted Query Expansions as a Method for Searching Mixed Quality Digitized Cultural Heritage Documents
Author(s)
Keskustalo, Heikki
Kettunen, Kimmo
Kumpulainen, Sanna
Ferro, Nicola
Silvello, Gianmaria
Järvelin, Anni
Kekäläinen, Jaana
Arvola, Paavo
Sormunen, Eero
Järvelin, Kalervo
Saastamoinen, Miamaria
Issue Date
2015-03-15
Keyword(s)
cultural institutions
information seeking/retrieval
archives and records
Abstract
Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, and errors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such different types of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set of expansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compounding language, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.
Publisher
iSchools
Series/Report Name or Number
iConference 2015 Proceedings
Type of Resource
text
Language
English
Permalink
http://hdl.handle.net/2142/73430
Copyright and License Information
Copyright 2015 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.