The Web Archives Workbench (WAW) Tool Suite: Taking an Archival Approach to the Preservation of Web Content
Author(s)
Hswe, Patricia
Kaczmarek, Joanne S.
Houser, Leah
Eke, Janet
Issue Date
2009
Keyword(s)
Digital preservation
National Digital Information Infrastructure Preservation Program (NDIIPP)
Abstract
The ECHO DEPository (also known as ECHO DEP, an abbreviation
for Exploring Collaborations to Harvest Objects in a Digital Environment
for Preservation) is an NDIIPP-partner project led by the
University of Illinois at Urbana-Champaign in collaboration with
OCLC and a consortium of partners, including five state libraries and
archives. A core deliverable of the project’s first phase was OCLC’s
development of the Web Archives Workbench (WAW), an opensource
suite of Web archiving tools for identifying, describing, and
harvesting Web-based content for ingestion into an external digital
repository. Released in October 2007, the suite is designed to bridge
the gap between manual selection and automated capture based on
the “Arizona Model,” which applies a traditional aggregate-based
archival approach to Web archiving. Aggregate-based archiving refers
to archiving items by group or in series, rather than individually. Core
functionality of the suite includes the ability to identify Web content
of potential interest through crawls of “seed” URLs and the domains
they link to; tools for creating and managing metadata for association
with harvested objects; website structural analysis and visualization
to aid human content selection decisions; and packaging using a
PREMIS-based METS profile developed by the ECHO DEPository
to support easier ingestion into multiple repositories. This article
provides background on the Arizona Model; an overview of how the
tools work and their technical implementation; and a brief summary
of user feedback from testing and implementing the tools.
Publisher
Johns Hopkins University Press and the Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
ISSN
0024-2594
Type of Resource
text
Language
en
Permalink
http://hdl.handle.net/2142/13595
Copyright and License Information
Copyright 2009 Board of Trustees of the University of Illinois.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.