Define expected data outcomes and types
Data Life Cycle stage(s): Plan
In the planning process, researchers should carefully consider what data will be produced in the course of their project.
Consider the following:
- What types of data will be collected? E.g. Spatial, temporal, instrument-generated, models, simulations, images, video etc.
- How many data files of each type are likely to be generated during the project? What size will they be?
- For each type of data file, what are the variables that are expected to be included?
- What software programs will be used to generate the data?
- How will the files be organized in a directory structure on a file system or in some other system?
- Will metadata information be stored separately from the data during the project?
- What is the relationship between the different types of data?
- Which of the data products are of primary importance and should be preserved for the long-term, and which are intermediate working versions not of long-term interest?
When preparing a data management plan, defining the types of data that will be generated helps in planning for short-term organization, the analyses to be conducted, and long-term data storage.
Considering data outcomes for the project helps anticipate budgetary, software, storage, and personnel needs, and for choosing an appropriate repository for long-term preservation.
Graham, A., McNeill, K., Stout, A., & Sweeney, L. (2010). Data Management and Publishing. Last modified November 29, 2010. https://libraries.mit.edu/data-management/
Van den Eynden, V., Corti, L., Woollard, M. & Bishop, L. (2011). Managing and Sharing Data: A Best Practice Guide for Researchers. Published May 2011. http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
“The project will result in spreadsheets of species abundance. One spreadsheet file (saved as .csv) will be generated for each site, and within each spreadsheet there will be data from multiple sampling dates. We will also generate text files that document observations by the researcher during data collection in the field. There will be a single text file for each site and each collection date.”
Cite this best practice:Carly Strasser, Sharon Farb, Thorny Staples, DataONE (May 11, 2011) "Best Practice: Define expected data outcomes and types". Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/bestpractices/define-expected-data on May 24, 2019