A QUESTION OF CHARACTER: How do we automatically recharacterize data at cloud scales?
O’Sullivan, Jack; Clipsham, David; Soni, Divyesh; Smith, Richard; Tilbury, Jonathan
Loading…
Permalink
https://hdl.handle.net/2142/121087
Description
Title
A QUESTION OF CHARACTER: How do we automatically recharacterize data at cloud scales?
Author(s)
O’Sullivan, Jack
Clipsham, David
Soni, Divyesh
Smith, Richard
Tilbury, Jonathan
Issue Date
2023
Keyword(s)
Scalability
Automation
Characterization
Preservation Actions
Abstract
Many preservation actions that we undertake on digital content are driven by the format of the content in question. Format information is often determined at the point of ingest and is not regularly updated as our knowledge of file formats improves over time. Periodically re-characterizing all content in a repository would ensure that we get more accurate identifications over time, but a more sustainable approach would be to only re-characterize content that was actually likely to have changed. Preservica’s new Automated Active Digital Preservation feature seeks to do exactly this, but even when considering only subsets of the data in our cloud systems, we are faced with significant challenges of scale. In this paper, we describe those challenges, the approach we have taken to implement the feature, and the testing we have performed to verify the viability of this approach.
Series/Report Name or Number
iPRES 2023
Type of Resource
text
Language
en
Copyright and License Information
Copyright held by the author(s). The text of this paper is published under a CC BY-SA license (https://creativecommons.org/licenses/by/4.0/).
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.