Toward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range
Author(s)
Guo, Siyuan
Edelblute, Trevor
Dai, Bin
Chen, Miao
Liu, Xiaozhong
Issue Date
2015-03-15
Keyword(s)
data analytics and evaluation
information organization and metadata
text/data/knowledge mining
Abstract
"In large-scale digital libraries, it is not uncommon that some bibliographic fields in metadata records are incomplete or missing. Adding to the incomplete or missing metadata can greatly facilitate users' search and access to digital library resources. Temporal information, such as publication date, is a key descriptor of digital resources. In this study, we investigate text mining methods to automatically resolve missing publication dates for the HathiTrust corpora, a large collection of documents digitized by optical character recognition (OCR). In comparison with previous approaches using only unigrams as features, our experiment results show that methods incorporating higher order n-gram features, e.g., bigrams and trigrams, can more effectively classify a document into discrete temporal intervals or ""chronons"". Our approach can be generalized to classify volumes within other digital libraries."
Publisher
iSchools
Series/Report Name or Number
iConference 2015 Proceedings
Type of Resource
text
Language
English
Permalink
http://hdl.handle.net/2142/73656
Copyright and License Information
Copyright 2015 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.