Withdraw
Loading…
Towards more transparent, reproducible, and reusable data cleaning with OpenRefine
Li, Lan; Ludäscher, Bertram; Zhang, Qian
Loading…
Permalink
https://hdl.handle.net/2142/103330
Description
- Title
- Towards more transparent, reproducible, and reusable data cleaning with OpenRefine
- Author(s)
- Li, Lan
- Ludäscher, Bertram
- Zhang, Qian
- Issue Date
- 2019-03-15
- Keyword(s)
- OpenRefine
- Data cleaning
- Provenance
- Transparency
- Reproducibility
- Reusability
- Abstract
- We study provenance features of OpenRefine, a popular data cleaning tool. In OpenRefine, provenance is available through operation histories and recipes. The former provide users with an undo/redo capability; the latter represent histories in JSON, so recipes can be reused. The model implicit in histories and recipes exhibits both prospective and retrospective provenance features, but is incomplete in at least two ways: (i) functions resulting in mass edits, and (ii) single cell edits are not captured, thus missing important prospective and retrospective provenance information, respectively. We propose to complete the missing information by capturing names and parameters of user-invoked functions, and by exposing retrospective provenance hidden in internal project files. The feasibility of the approach is demonstrated with an early prototype.
- Publisher
- iSchools
- Series/Report Name or Number
- iConference 2019 Proceedings
- Type of Resource
- text
- Language
- eng
- Permalink
- http://hdl.handle.net/2142/103330
- DOI
- https://doi.org/10.21900/iconf.2019.103330
- Copyright and License Information
- Copyright 2019 Lan Li, Bertram Ludäscher, and Qian Zhang
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…