Data Management Skillbuilding Hub

Best Practice: Document steps used in data processing

BEST PRACTICE

Best Practices by Data Life Cycle




Document steps used in data processing

Data Life Cycle stage(s): Analyze   Describe   Integrate

Different types of new data may be created in the course of a project, for instance visualizations, plots, statistical outputs, a new dataset created by integrating multiple datasets, etc. Whenever possible, document your workflow (the process used to clean, analyze and visualize data) noting what data products are created at each step. Depending on the nature of the project, this might be as a computer script, or it may be notes in a text file documenting the process you used (i.e. process metadata). If workflows are preserved along with data products, they can be executed and enable the data product to be reproduced.

Description Rationale

To enable others to verify the quality of a given data product, and ideally, to reproduce it, it is critical that the steps followed to create that product be properly documented.

Additional Information

This best practice is also applicable to other categories including Analysis and Visualization and Data Documentation.

  • Juliana Freire, Cláudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos Eduardo Scheidegger, Huy T. Vo: Managing Rapidly-Evolving Scientific Workflows. IPAW 2006: 10-18
  • Juliana Freire, David Koop, Emanuele Santos, Cláudio T. Silva: Provenance for Computational Tasks: A Survey. Computing in Science and Engineering 10(3): 11-21 (2008)

Tags

 
 
 
 
 

Cite this best practice:

Eric Lind, Juliana Freire, DataONE  (May 11, 2011) "Best Practice: Document steps used in data processing". Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/bestpractices/document-steps-used on May 24, 2019


Home

Hosted by DataONE

In collaboration with the community, DataONE has developed high quality resources for helping educators and librarians with training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.

Question If you have a question or concern, please open an Issue in this repository on GitHub.