Withdraw
Loading…
The application of file identification, validation, and characterization tools in digital curation
Ford, Kevin M.
Loading…
Permalink
https://hdl.handle.net/2142/24301
Description
- Title
- The application of file identification, validation, and characterization tools in digital curation
- Author(s)
- Ford, Kevin M.
- Issue Date
- 2011-05-25T14:56:50Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Cragin, Melissa H.
- McDonough, Jerome P.
- Department of Study
- Library & Information Science
- Discipline
- Library & Information Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Digital curation
- Digital preservation
- File identification
- File validation
- File characterization
- Preservation tools
- Preservation software
- Abstract
- "File format identification, characterization, and validation are considered essential processes for digital preservation and, by extension, long-term data curation. These actions are performed on data objects by humans or computers, in an attempt to identify the type of a given file, derive characterizing information that is specific to the file, and validate that the given file conforms to its type specification. The present research reviews the literature surrounding these digital preservation activities, including their theoretical basis and the publications that accompanied the formal release of tools and services designed in response to their theoretical foundation. It also reports the results from extensive tests designed to evaluate the coverage of some of the software tools developed to perform file format identification, characterization, and validation actions. Tests of these tools demonstrate that more work is needed - particularly in terms of scalable solutions - to address the expanse of digital data to be preserved and curated. The breadth of file types these tools are anticipated to handle is so great as to call into question whether a scalable solution is feasible, and, more broadly, whether such efforts will offer a meaningful return on investment. Also, these tools, which serve to provide a type of baseline reading of a file in a repository, can be easily tricked. It is possible to generate files with nothing more than a proper file extension and correct magic number and have the tools ""positively"" identify the file. This is not the same as a file that conforms to its specification, and one that could be considered valid. The ability to manipulate the results returned by these tools raises issues of identity, trust, security and risk."
- Graduation Semester
- 2011-05
- Permalink
- http://hdl.handle.net/2142/24301
- Copyright and License Information
- Copyright 2011 Kevin Ford. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. A copy of the license is available at http://creativecommons.org/licenses/by-nc-nd/3.0/
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Information Sciences
Dissertations and theses from the School of Information SciencesManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…