Decide what data to preserve
Data Life Cycle stage(s): Preserve
The process of science generates a variety of products that are worthy of preservation. Researchers should consider all elements of the scientific process in deciding what to preserve:
- Raw data
- Tables and databases of raw or cleaned observation records and measurements
- Intermediate products, such as partly summarized or coded data that are the input to the next step in an analysis
- Documentation of the protocols used
- Software or algorithms developed to prepare data (cleaning scripts) or perform analyses
- Results of an analysis, which can themselves be starting points or ingredients in future analyses, e.g. distribution maps, population trends, mean measurements
- Any data sets obtained from others that were used in data processing
- Multimedia: documented procedures, or standalone data
When deciding on what data products to preserve, researchers should consider the costs of preserving data:
- Raw data are usually worth preserving
- Consider space requirements when deciding on whether to preserve data
- If data can be easily or automatically re-created from raw data, consider not preserving. E.g. if data that have undergone quality control processes and were analyzed, consider preserving since reproduction might be costly
- Algorithms and software source code cost very little to preserve
- Results of analyses may be particularly valuable for future discovery and cost very little to preserve
Researchers should consider the following goals and benefits of preservation:
- Enabling re-analysis of the same products to determine whether the same conclusions are reached
- Enabling re-use of the products for new analysis and discovery
- Enabling restoration of original products in the case that working datasets are lost
Description Rationale
To meet multiple goals for preservation, researchers should think broadly about the digital products that their project generates, preserve as many as possible, and plan the appropriate preservation methods for each.
Cite this best practice:
Cindy Parr, Heather Henkel, Keven Comerford, DataONE (May 11, 2011) "Best Practice: Decide what data to preserve". Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/bestpractices/decide-what-data on Mar 01, 2024Home