Data Management Skillbuilding Hub

Best Practice: Confirm a match between data and their description in metadata


Best Practices by Data Life Cycle

Confirm a match between data and their description in metadata

Data Life Cycle stage(s): Assure   Describe

To assure that metadata correctly describes what is actually in a data file, visual inspection or analysis should be done by someone not otherwise familiar with the data and its format. This will assure that the metadata is sufficient to describe the data. For example, statistical software can be used to summarize data contents to make sure that data types, ranges and, for categorical data, values found, are as described in the documentation/metadata.

Description Rationale

Sometimes mistakes in either data or metadata preparation cause discrepancies between the two. These can include missing (or extra) columns of data, mis-ordered columns of data, or discrepant values.

Additional Information


Metadata describes a dataset that has two columns, the first is defined to be StationID and should contain station codes “Station1” and “Station2.” The second column contains temperature data with a range between -20 and 40 degrees Celsius. However, the data file contains three columns. The first contains the temperature, the second humidity and the third the StationID with stations labeled “Stat1”, “Stat2”, and “Stat3”. This sort of problem can occur if data is processed or added after initial metadata was created, or if there were simply mistakes made in the metadata preparation. Having a naive user use the metadata to ingest and analyze this data will make the problems clear and either the metadata or the data can be altered to make it so they correspond.



Cite this best practice:

Eric Lind, John Porter, Michael Grady, DataONE  (May 11, 2011) "Best Practice: Confirm a match between data and their description in metadata". Accessed through the Data Management Skillbuilding Hub at on Aug 31, 2020


Hosted by DataONE

In collaboration with the community, DataONE has developed high quality resources for helping educators and librarians with training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.

Question If you have a question or concern, please open an Issue in this repository on GitHub.