Document steps used in data processing

Data Life Cycle stage(s): Analyze Describe Integrate

Different types of new data may be created in the course of a project, for instance visualizations, plots, statistical outputs, a new dataset created by integrating multiple datasets, etc. Whenever possible, document your workflow (the process used to clean, analyze and visualize data) noting what data products are created at each step. Depending on the nature of the project, this might be as a computer script, or it may be notes in a text file documenting the process you used (i.e. process metadata). If workflows are preserved along with data products, they can be executed and enable the data product to be reproduced.

Description Rationale

To enable others to verify the quality of a given data product, and ideally, to reproduce it, it is critical that the steps followed to create that product be properly documented.

Additional Information

This best practice is also applicable to other categories including Analysis and Visualization and Data Documentation.

Juliana Freire, Cláudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos Eduardo Scheidegger, Huy T. Vo: Managing Rapidly-Evolving Scientific Workflows. IPAW 2006: 10-18
Juliana Freire, David Koop, Emanuele Santos, Cláudio T. Silva: Provenance for Computational Tasks: A Survey. Computing in Science and Engineering 10(3): 11-21 (2008)

Data Management Skillbuilding Hub

Best Practices by Data Life Cycle

Document steps used in data processing

Description Rationale

Additional Information

Tags

Cite this best practice:

Hosted by DataONE

Best Practices by Data Life Cycle

Document steps used in data processing

Description Rationale

Additional Information

Related Best Practices

Tags

Cite this best practice:

Hosted by DataONE