Provide a citation and document provenance for your dataset
For appropriate attribution and provenance of a dataset, the following information should be included in the data documentation or the companion metadata file:
- Name the people responsible for the dataset throughout the lifetime of the dataset, including for each person:
- Name
- Contact information
- Role (e.g., principal investigator, technician, data manager)
According to the International Polar Year Data and Information Service, an author is the individual(s) whose intellectual work, such as a particular field experiment or algorithm, led to the creation of the dataset. People responsible for the data can include: individuals, groups, compilers or editors.
- Description of the context of the dataset with respect to a larger project or study (include links and related documentation), if applicable.
- Revision history, including additions of new data and error corrections. Links to source data, if the data in one dataset were derived from data in another dataset.
- List of project support (e.g., funding agencies, collaborators, material support).
- Describe how to properly cite the dataset. The data citation should include:
- All contributors
- date of dataset publication
- Title of dataset
- media or URL
- Data publisher
- Identifier (Digital Object Identifier)
Description Rationale
Documenting the dataset origin, history, and contact information allows for proper citation of datasets. By encouraging the proper citation of datasets, data providers and publishers receive appropriate credit for their efforts.
Additional Information
The Oak Ridge National Laboratory Distributed Active Archive Center has guidance and rational for citing data sets: Editorial: Citations to Published Data Sets
Buneman P, Khanna S, Tan W. 2001. Why and Where: A Characterization of Data Provenance. Pp. 316-330 in Lecture Notes in Computer Science. Springer Berlin/Heidelberg. https://doi.org/10.1007/3-540-44503-X_20
Osterweil LJ, Clarke LA, Ellison AM, Boose E, Podorozhny R, Wise A. 2010. Clear and precise specification of ecological data management processes and dataset provenance. IEEE Transations on Automation Science and Engineering 7(1):189-195. https://doi.org/10.1109/TASE.2009.2021774
Simmhan YL, Plale B, Gannon D. 2005. A survey of data provenance in e-science. ACM SIGMOD 34(3):31-36. https://doi.org/10.1145/1084805.1084812
Examples
Turner, D.P., W.D.Ritts, and M. Gregory. 2006. BigFoot NPP Surfaces for North and South American Sites, 2002-2004. Data set. Available on-line http://daac.ornl.gov from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. https://doi.org/10.3334/ORNLDAAC/750
Cite this best practice:
Sherry Lake, DataONE (September 01, 2011) "Best Practice: Provide a citation and document provenance for your dataset". Accessed through the Data Management Skillbuilding Hub at https://dataoneorg.github.io/Education/bestpractices/provide-a-citation on Mar 01, 2024Home