What Dataset Descriptions Actually Describe: Using the Systematic Assertion Model to Connect Theory and Practice

Wickett, Karen M.; Thomer, Andrea; Sacchi, Simone; Baker, Karen S.; Dubin, David

What Dataset Descriptions Actually Describe: Using the Systematic Assertion Model to Connect Theory and Practice

Wickett, Karen M.; Thomer, Andrea; Sacchi, Simone; Baker, Karen S.; Dubin, David

Permalink

https://hdl.handle.net/2142/30470

Description

Title

What Dataset Descriptions Actually Describe: Using the Systematic Assertion Model to Connect Theory and Practice

Author(s)

Wickett, Karen M.
Thomer, Andrea
Sacchi, Simone
Baker, Karen S.
Dubin, David

Issue Date

2012-03-22

Keyword(s)

Metadata, Scientific Data, Data Curation

Abstract

Scientific data is encoded and described with the aim of supporting retrieval, meaningful interpretation and reuse. Encoding standards for datasets like FGDC, DwC, EML typically include tagged metadata elements along with the encoded data, suggesting that, per the Dublin Core 1:1 principle, those elements apply to one and only one entity (a specimen, observation, dataset, etc.). However, in practice vocabularies are often used to describe different dimensions of scientific data collection and communication processes. Discriminating these aspects offers a more precise account of how symbols and the propositions they express acquire the status of “data” and “data content,” respectively. In this poster we present an analysis of species occurrence records basecd on the Systematic Assertion Model (SAM) [DWS]. SAM is a framework for describing the encoding and representation of scientific data, bridging the gap between data preservation models and discipline-specific scientific ontologies. The model is intended to be general enough for any scientific domain, and not bound to any particular methodology or field of study. Since species occurrence records are a kind of data that is frequent re-used, migrated across systems and shared they are a good target for analysis. Sample data is reviewed in the context of SAM, and analyzed with respect to the provenance events, entities, and relationships governing our definitions of data and data content. The exercise serves to: 1. highlight targets for data description (expression, content, assertion, justification). 2. inform the discovery of anomalous or missing contextual/background information. 3. frame a comparison of generic metadata standards (e.g. Dublin Core) with standards created specifically for scientific use (FGDC, DwC, EML). 4. clarify competing criteria for the identification of data that is tied to the scientific assertions carried by a dataset, and not specific to the details of a format or encoding.

Publisher

American Society for Information Science and Technology

Type of Resource

image

Language

Permalink

http://hdl.handle.net/2142/30470

Copyright and License Information

All reserved by the authors.

What Dataset Descriptions Actually Describe: Using the Systematic Assertion Model to Connect Theory and Practice

Wickett, Karen M.; Thomer, Andrea; Sacchi, Simone; Baker, Karen S.; Dubin, David

Permalink

Description

Owning Collections

Research Presentations - CIRSS PRIMARY

What Dataset Descriptions Actually Describe: Using the Systematic Assertion Model to Connect Theory and Practice

Wickett, Karen M.; Thomer, Andrea; Sacchi, Simone; Baker, Karen S.; Dubin, David

Permalink

Description

Owning Collections

Research Presentations - CIRSS PRIMARY

Log In