DataONE Provenance Demo

Overview

DataONE is a federated data network focusing on earth and environmental science data. DataONE provenance systems enable reproducible research and facilitate proper attribution of scientific results transitively across generations of derived data products.

In this demonstration we describe two features related to provenance. The first is an API for capturing retrospective provenance from R and Matlab script executions, called Run Manager. The second is a script annotation tool for prospective provenance, which we call YesWorkflow, designed to help developers and users better understand the structure and intent of a script.

The RunManager can manage the file I/O events and determine provenance relationships between file objects related to a script execution. Information about the script execution is stored in the cache. The RunManager provides functions for capturing, searching, archiving, and sharing provenance. The supported functions are available at DataONE Run Manager and API for Capturing Provenance in Script Executions.

DataONE provides two RunManager implementation for DataONE recordr R package and DataONE Matlab toolbox scripts. Check out the Matlab-DataONE User Guide and Recordr-DataONE Introduction for more information on DataONE provenance tools. File all bugs/feature requests at Matlab-DataONE Github Repository and DataONE Provenance Tracking for R.

YesWorkflow is a tool to enable users to mark up scripts with YesWorkflow annotations to reveal the computational steps and dataflows that may be implicit in the scripts. In addition, YesWorkflow provides query capability for the prospective and retrospective provenance of the scripts. For example, YW-TaPP-15-Recon, YW-Matlab, and YW-noWorkflow.

With DataONE provenance tools, each run of a script will generate a data package. When the data package is indexed by the coordinating node upon the publication of the data package, the logical connection can be viewed and explored at DataONE Search Site to facilitate data sharing in the community.