Data Management Skillbuilding Hub

Best Practice: Consider the compatibility of the data you are integrating


Best Practices by Data Life Cycle

Consider the compatibility of the data you are integrating

Data Life Cycle stage(s): Assure

The integration of multiple data sets from different sources requires that they be compatible. Methods used to create the data should be considered early in the process, to avoid problems later during attempts to integrate data sets. Note that just because data can be integrated does not necessarily mean that they should be, or that the final product can meet the needs of the study. Where possible, clearly state situations or conditions where it is and is not appropriate to use your data, and provide information (such as software used and good metadata) to make integration easier.

Description Rationale

When using integrated data sets, it is crucial that the data are comparable and compatible to avoid mistakes in analyses and interpretation.

Additional Information

Burley, T.E., and Peine, J.D., 2009, NBII-SAIN Data Management Toolkit, U.S. Geological Survey Open-File Report 2009-1170, 96 p. Available from:


Water-quality data collected by two separate agencies may be thematically similar but may have been sampled using completely different methods. Differences in such water-quality sample methods can include equipment, sampling method protocol, and lab analysis procedures. Analysis performed on integrated water-quality data that were collected using completely different methods would likely result in questionable results.



Cite this best practice:

Eric Lind, Steve Aulenback, Tom Burley, DataONE  (May 11, 2011) "Best Practice: Consider the compatibility of the data you are integrating". Accessed through the Data Management Skillbuilding Hub at on Aug 22, 2019


Hosted by DataONE

In collaboration with the community, DataONE has developed high quality resources for helping educators and librarians with training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.

Question If you have a question or concern, please open an Issue in this repository on GitHub.