Data Management Skillbuilding Hub

Best Practice: Identify data with long-term value


Best Practices by Data Life Cycle

Identify data with long-term value

Data Life Cycle stage(s): Preserve

As part of the data life cycle, research data will be contributed to a repository to support preservation and discovery. A research project may generate many different iterations of the same dataset - for example, the raw data from the instruments, as well as datasets which already include computational transformations of the data.

In order to focus resources and attention on these core datasets, the project team should define these core data assets as early in the process as possible, preferably at the conceptual stage and in the data management plan. It may be helpful to speak with your local data archivist or librarian in order to determine which datasets (or iterations of datasets) should be considered core, and which datasets should be discarded. These core datasets will be the basis for publications, and require thorough documentation and description.

  • Only the datasets which have significant long-term value should be contributed to a repository, requiring decisions about which datasets need to be kept.
  • If data cannot be recreated or it is costly to reproduce, it should be saved.
  • Four different categories of potential data to save are observational, experimental, simulation, and derived (or compiled).
  • Your funder or institution may have requirements and policies governing contribution to repositories.

Given the amount of data produced by scientific research, keeping everything is neither practical nor economically feasible.

Description Rationale

Decisions about what data to keep will help to focus project resources on those data that should be stored for long-term preservation.

Additional Information

Whyte, Angus. Appraise and Select Research Data for Curation. Digital Curation Centre.



Cite this best practice:

Gunter Waibel, DataONE  (May 11, 2011) "Best Practice: Identify data with long-term value". Accessed through the Data Management Skillbuilding Hub at on Aug 22, 2019


Hosted by DataONE

In collaboration with the community, DataONE has developed high quality resources for helping educators and librarians with training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.

Question If you have a question or concern, please open an Issue in this repository on GitHub.