Withdraw
Data Format Description Language: Lessons Learned, Concepts and Experience
McGrath, Robert E.
Permalink
https://hdl.handle.net/2142/27693
Description
- Title
- Data Format Description Language: Lessons Learned, Concepts and Experience
- Author(s)
- McGrath, Robert E.
- Issue Date
- 2011-09
- Keyword(s)
- Data Format Description Langauge
- XML Schema
- Open Grid Forum Standard
- Abstract
- For the past 6 years as part of the “Innovative Systems and Software: Applications to NARA Research Problems” project, NCSA has contributed to the development of the Open Grid Forum (OGF) standard format description language, the Data Format Description Language (DFDL). A DFDL parser is sufficient to support interpretation of arbitrary binary or ASCII formatted files in terms of well-defined logical models.
The Data Format Description Language emerged from a variety of unrelated projects and products, which had various goals and approaches. The goal of the OGF DFDL-WG is to build on previous experience to create a consensus standard that can replace the disparate related efforts. In 2011, the DFDL specification was accepted as a “Proposed Recommendation” of the Open Grid Forum.
The DFDL is a critical new technology for many important use cases, including: • Access and manipulation of non-XML data, such as data from sensors or simulations • Interoperation of data from many independent sources • Preservation of access to data for long periods of time • Construction and access to “virtual datasets” from many sources.
This capability is especially interesting for archives that need to preserve access to data for long periods of time.
Beyond maintaining the accessibility of the raw ‘1’s and ‘0’s of digital data, preservation and interoperation requires maintaining an ability to interpret the data as meaningful structures, relationships, and visual representations. NCSA has investigated concepts for a general descriptive method for accessing data in arbitrary file formats and providing interpreted information of it in XML and RDF representations, supporting discovery and long-term preservation of content.
This technology has broad application across the curation and preservation processes, and more broadly in e-Science in general, and the DFDL has been identified by the US National Archives and Record Administration (NARA) as a priority in the area of Human Computer Interaction and Information Management.
This project has included contributions to the development of the DFDL standard, test implementations of the concepts, and explorations of semantic extensions for DFDL. This document summarizes the activities and presents some lessons learned in the course of this project.
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/27693
- Sponsor(s)/Grant Number(s)
- National Science Foundation Cooperative Agreement NSF OCI 05-25308
- Cooperative Support Agreements NSF OCI 04-38712 and NSF OCI 05-04064 by the National Archives and Records Administration