Best Practice: Describe
Select a Best Practice below to learn more about the “Describe” stage in the Data Life Cycle.
What is the “Describe” stage?
Document data by describing the why, who, what, when, where, and how of the data. Metadata, or data about data, are key to data sharing and reuse, and many tools such as standards and software are available to help describe data.
More information can be found in the Best Practices Primer.
-
Assign descriptive file names
File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version n... (click for more)
Tags: access describe discover format
Choose and use standard terminology to enable discoveryTerms and phrases that are used to represent categorical data values or for creating content in metadata records should reflect appropriate and accepted vocabularies in your community or institution. Methods used to identify and select the proper termin... (click for more)
Tags: controlled vocabulary describe documentation metadata ontologies preserve standards
Confirm a match between data and their description in metadataTo assure that metadata correctly describes what is actually in a data file, visual inspection or analysis should be done by someone not otherwise familiar with the data and its format. This will assure that the metadata is sufficient to describe the da... (click for more)
Tags: assure data consistency describe documentation metadata quality
Create a data dictionaryA data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and ... (click for more)
Tags: controlled vocabulary describe documentation metadata terminology units
Define the data modelA data model documents and organizes data, how it is stored and accessed, and the relationships among different types of data. The model may be abstract or concrete. (click for more)
Tags: access data model describe plan
Define the parametersThe parameters reported in the data set need to have names that clearly describe the contents. Ideally, the names should be standardized across files, data sets, and projects, in order that others can readily use the information. (click for more)
Tags: describe documentation metadata parameter
Describe format for spatial locationSpatial coordinates should be reported in decimal degrees format to at least 4 (preferably 5 or 6) significant digits past the decimal point. An accuracy of 1.11 meters at the equator is represented by +/- 0.00001. This does not include uncertainty intr... (click for more)
Tags: access describe format geospatial location
Describe formats for date and timeFor date, always include four digit year and use numbers for months. For example, the date format yyyy-mm-dd would appear as 2011-03-15 (March 15, 2011). (click for more)
Tags: date describe format standards time
Describe measurement techniquesData measurement descriptions should: (click for more)
Tags: access calibration
Describe method to create derived data productsWhen describing the process for creating derived data products, the following information should be included in the data documentation or the companion metadata file: (click for more)
Tags: analyze data processing describe provenance
Describe the contents of data filesA description of the contents of the data file should contain the following: (click for more)
Tags: describe documentation format metadata parameter units
Describe the overall organization of your datasetData sets or collections are often composed of multiple files that are related. Files may have come from (or still be stored in) a relational database, and the relationships among the data tables or other entities are important if the data are to be reu... (click for more)
Tags: data model database describe documentation metadata
Describe the research projectThe research project description should contain the following information: (click for more)
Tags: annotation data creators describe geography geospatial measurement
Describe the sensor networkIf your project uses a sensor network, you should describe and document that network and the instruments it uses. This information is essential to understanding and interpreting the data you use, and should be included as a part of the metadata generate... (click for more)
Tags: calibration
Describe the spatial extent and resolution of your datasetThe spatial extent of your data set or collection as a whole should be described. The minimum acceptable description would be a bounding box describing the northern most, southern most, western most, and eastern most limits of the data. (click for more)
Tags: describe documentation geospatial location measurement metadata
Describe the temporal extent and resolution of your datasetThe temporal extent over which the data within your dataset or collection was acquired or collected should be described. Normally this is done by providing (click for more)
Tags: date describe documentation measurement metadata time
Describe the units of measurement for each observationThe units of reported parameters need to be explicitly stated in the data file and in the documentation. We recommend SI units (The International System of Units) but recognize that each discipline has its own commonly used units of measure. The critica... (click for more)
Tags: describe measurement units
Document and store data using stable file formatsFile formats are important for understanding how data can be used and possibly integrated. The following issues need to be documented: Does the file format of the data adhere to one or more standards? Is that file standard an open (i.e. open source... (click for more)
Tags: documentation format metadata preserve storage tabular
Document steps used in data processingDifferent types of new data may be created in the course of a project, for instance visualizations, plots, statistical outputs, a new dataset created by integrating multiple datasets, etc. Whenever possible, document your workflow (the process used to c... (click for more)
Tags: analyze data processing describe integrate provenance replicable data
Describe the overall organization of your datasetIdentification of any species represented in the data set should be as complete as possible. (click for more)
Tags: describe metadata standards taxonomy terminology
Document your data organization strategyThe following are strategies for effective data organization: Sparse matrix: Optimal data models for storing data avoid sparse matrices, i.e. if many data points within a matrix are empty a data table with a column for parameters and a column for val... (click for more)
Tags: data management plan data model data normalization database describe
Ensure flexible data services for virtual datasetsIn order for a large dataset to be effectively used by a variety of end users, the following procedures for preparing a virtual dataset are recommended: (click for more)
Tags: data archives data services describe preserve
Identify and use relevant metadata standardsMany times significant overlap exists among metadata content standards. You should identify those standards that include the fields needed to describe your data. In order to describe your data, you need to decide what information is required for data us... (click for more)
Tags: controlled vocabulary describe documentation format metadata preserve
Maintain consistent data typingChoose the right data type and precision for data in each column. As examples: (1) use date fields for dates; and (2) use numerical fields with decimal places precision. Comments and explanations should not be included in a column that is meant to inclu... (click for more)
Tags: database describe documentation format metadata
Provide a citation and document provenance for your datasetFor appropriate attribution and provenance of a dataset, the following information should be included in the data documentation or the companion metadata file: (click for more)
Tags: citation data creators data source describe preserve provenance
Provide capabilities for tagging and annotation of your data by the communityPeople have different perspectives on what data means to them, and how it can be used and interpreted in different contexts. Data users ranging from community participants to researchers in different domains can provide unique and valuable insights into... (click for more)
Tags: annotation controlled vocabulary describe documentation metadata
Provide identifier for dataset usedIn order to ensure replicable data access: (click for more)
Tags: access data consistency describe preserve provenance replicable data
Separate data values from annotationsA separate column should be used for data qualifiers, descriptions, and flags, otherwise there is the potential for problems to develop during analyses. Potential entries in the descriptor column: (click for more)
Tags: annotation describe documentation flag format metadata
Sharing data: legal and policy considerationsAll research requires the sharing of information and data. The general philosophy is that data are freely and openly shared. However, funding organizations and institutions may require that their investigators cite the impact of their work, including sh... (click for more)
Tags: access citation data creators data source
Use appropriate field delimitersDelimit the columns within a data table using commas or tabs; these are listed in order of preference. Semicolons are used in many systems as line end delimiters and may cause problems if data are imported into those systems (e.g. SAS, PHP scripts). Avo... (click for more)
Tags: access collect describe format
Use consistent codesBe consistent in the use of codes to indicate categorical variables, for example species names, sites, or land cover types. Codes should always be the same within one data set. Pay particular attention to spelling and case; most frequent problems are wi... (click for more)
Tags: coding collect controlled vocabulary describe ontologies