Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust+Bookworm Project Technical Report
Downie, J Stephen; Lieberman-Aiden, Erez
Loading…
Permalink
https://hdl.handle.net/2142/112750
Description
Title
Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust+Bookworm Project Technical Report
Author(s)
Downie, J Stephen
Lieberman-Aiden, Erez
Contributor(s)
Organisciak, Peter
Schmidt, Benjamin
Bhattacharyya, Sayan
Jett, Jacob
Issue Date
2017-12-07
Keyword(s)
HathiTrust
Natural Language Processing
Metadata
Data Visualization
Abstract
Bookworm is a tool that visualizes language usage trends at large scales, designed to be powerful but simple. It allows multi-faceted slicing and dicing of the data against a set of content-based and metadata-based features. Our recent work with the HathiTrust+Bookworm (HT+BW) project has focused on improving Bookworm's ability to scale for large collections, while supporting an implementation of Bookworm over one of the largest digital book collections: the HathiTrust Digital Library. The implementation allows scholars to explore the full HathiTrust corpus — but with the control to compare on the basis of such features as subject classification, place of publication, genre, and language. It also provides tools for
improved future implementations of Bookworm over non-HathiTrust collections.
Publisher
Center for Informatics Research in Science and Scholarship, School of Information Sciences, University of Illinois at Urbana-Champaign
Type of Resource
text
Language
en
Permalink
http://hdl.handle.net/2142/112750
Sponsor(s)/Grant Number(s)
National Endowment for the Humanities (#HK-50176-14)
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.