Data Management Skillbuilding Hub

Best Practice: Define expected data outcomes and types

BEST PRACTICE

Best Practices by Data Life Cycle




Define expected data outcomes and types

Data Life Cycle stage(s): Plan

In the planning process, researchers should carefully consider what data will be produced in the course of their project.

Consider the following:

  • What types of data will be collected? E.g. Spatial, temporal, instrument-generated, models, simulations, images, video etc.
  • How many data files of each type are likely to be generated during the project? What size will they be?
  • For each type of data file, what are the variables that are expected to be included?
  • What software programs will be used to generate the data?
  • How will the files be organized in a directory structure on a file system or in some other system?
  • Will metadata information be stored separately from the data during the project?
  • What is the relationship between the different types of data?
  • Which of the data products are of primary importance and should be preserved for the long-term, and which are intermediate working versions not of long-term interest?

When preparing a data management plan, defining the types of data that will be generated helps in planning for short-term organization, the analyses to be conducted, and long-term data storage.

Description Rationale

Considering data outcomes for the project helps anticipate budgetary, software, storage, and personnel needs, and for choosing an appropriate repository for long-term preservation.

Additional Information

Graham, A., McNeill, K., Stout, A., & Sweeney, L. (2010). Data Management and Publishing. Last modified November 29, 2010. https://libraries.mit.edu/data-management/

Van den Eynden, V., Corti, L., Woollard, M. & Bishop, L. (2011). Managing and Sharing Data: A Best Practice Guide for Researchers. Published May 2011. http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

Examples

“The project will result in spreadsheets of species abundance. One spreadsheet file (saved as .csv) will be generated for each site, and within each spreadsheet there will be data from multiple sampling dates. We will also generate text files that document observations by the researcher during data collection in the field. There will be a single text file for each site and each collection date.”

Tags

 
 
 

Cite this best practice:

Carly Strasser, Sharon Farb, Thorny Staples, DataONE  (May 11, 2011) "Best Practice: Define expected data outcomes and types". Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/bestpractices/define-expected-data on Mar 01, 2024


Home

Hosted by DataONE

In collaboration with the community, DataONE has developed high quality resources for helping educators and librarians with training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.

Question If you have a question or concern, please open an Issue in this repository on GitHub.