Data Management Skillbuilding Hub

Best Practice: Decide what data to preserve

BEST PRACTICE

Best Practices by Data Life Cycle




Decide what data to preserve

Data Life Cycle stage(s): Preserve

The process of science generates a variety of products that are worthy of preservation. Researchers should consider all elements of the scientific process in deciding what to preserve:

  • Raw data
  • Tables and databases of raw or cleaned observation records and measurements
  • Intermediate products, such as partly summarized or coded data that are the input to the next step in an analysis
  • Documentation of the protocols used
  • Software or algorithms developed to prepare data (cleaning scripts) or perform analyses
  • Results of an analysis, which can themselves be starting points or ingredients in future analyses, e.g. distribution maps, population trends, mean measurements
  • Any data sets obtained from others that were used in data processing
  • Multimedia: documented procedures, or standalone data

When deciding on what data products to preserve, researchers should consider the costs of preserving data:

  • Raw data are usually worth preserving
  • Consider space requirements when deciding on whether to preserve data
  • If data can be easily or automatically re-created from raw data, consider not preserving. E.g. if data that have undergone quality control processes and were analyzed, consider preserving since reproduction might be costly
  • Algorithms and software source code cost very little to preserve
  • Results of analyses may be particularly valuable for future discovery and cost very little to preserve

Researchers should consider the following goals and benefits of preservation:

  • Enabling re-analysis of the same products to determine whether the same conclusions are reached
  • Enabling re-use of the products for new analysis and discovery
  • Enabling restoration of original products in the case that working datasets are lost

Description Rationale

To meet multiple goals for preservation, researchers should think broadly about the digital products that their project generates, preserve as many as possible, and plan the appropriate preservation methods for each.

Tags

 
 
 
 

Cite this best practice:

Cindy Parr, Heather Henkel, Keven Comerford, DataONE  (May 11, 2011) "Best Practice: Decide what data to preserve". Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/bestpractices/decide-what-data on May 24, 2019


Home

Hosted by DataONE

In collaboration with the community, DataONE has developed high quality resources for helping educators and librarians with training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.

Question If you have a question or concern, please open an Issue in this repository on GitHub.