Provenance Demonstration Use Case

We introduce the provenance and search features of DataONE by means of a use case involving three earth scientists who interact through a DataONE member node, as shown below.

A user “Alice” can annotate a (Matlab, R, etc.) script using the YesWorkflow (YW) tool to describe the underlying workflow or prospective provenance. Then, Alice can call the RunManager record( ) function. The record( ) method takes the R/Matlab script as an argument and records files that were read and written by R/Matlab functions that are registered with the RunManager. After Alice has run the script, the result files, script, prospective provenance, and retrospective provenance, represented in the ProvONE provenance model, are bundled into an OAI-ORE compliant data package and published to the DataONE network.

A second user (“Bob”) discovers Alice’s package and uses her data in his own analysis. Bob can call the RunManager record( ) function to record the retrospective and prospective information. The RunManager publish( ) can package and publish data generated by his script execution.

After the published data packages from Alice and Bob are indexed by the coordinating node, a third user (“Charlie”) who browses DataONE Search site can discover the full provenance of Bob’s results.

ProvONE data model is an extension of the W3C PROV-O standard for representing provenance, and includes specializations for representing both retrospective provenance about the runtime execution and prospective provenance about the structure and flow of the analytical script or workflow.