I want to search

MENU

Best Practices

Advertise your data using datacasting tools

To make your data available using standard and open software tools you should:

  • Use standard language and terms to clearly communicate to others that your data are available for reuse and that you expect ethical and appropriate use of your data
  • Use an open source datacasting (RSS or other type) service that enables you to advertise your data and the options for others to obtain access to it (RSS, GeoRSS, DatacastingRSS)
Assign descriptive file names

File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.

When choosing a file name, check for any database management limitations on file name length and use of special characters. Also, in general, lower-case names are less software and platform dependent. Avoid using...

Backup your data

To avoid accidental loss of data you should:

  • Backup your data at regular frequencies
    • When you complete your data collection activity
    • After you make edits to your data
  • Streaming data should be backed up at regularly scheduled points in the collection process
    • High-value data should be backed up daily or more often
    • Automation simplifies frequent backups
  • Backup strategies (e.g., full, incremental, differential,...
Check data and other outputs for print and web accessibility

To maximize usability of your data or outputs, ensure that those with impairments or disabilities will still be able to access and understand them. The Web Accessibility Initiative, from the W3C, suggests that those producing content for others consider the following (text from their website):

Make your outputs perceivable

  • Provide text alternatives for non-text content.
  • Provide captions and other alternatives for multimedia.
  • Create content that can be...
Choose and use standard terminology to enable discovery

Terms and phrases that are used to represent categorical data values or for creating content in metadata records should reflect appropriate and accepted vocabularies in your community or institution. Methods used to identify and select the proper terminology include:

  • Identify the relevant descriptive terms used as categorical values in your community prior to start of the project (ex: standard terms describing soil horizons, plant taxonomy, sampling methodology or equipment, etc...
Communicate data quality

Information about quality control and quality assurance are important components of the metadata:

  • Qualify (flag) data that have been identified as questionable by including a flagging_column next to the column of data values. The two columns should be properly associated through a naming convention such as Temperature, flag_Temperature.
  • Describe the qality control methods applied and their assumptions in the metadata. Describe any software used when performing the...
Tags: assure, flag, quality
Confirm a match between data and their description in metadata

To assure that metadata correctly describes what is actually in a data file, visual inspection or analysis should be done by someone not otherwise familiar with the data and its format. This will assure that the metadata is sufficient to describe the data. For example, statistical software can be used to summarize data contents to make sure that data types, ranges and, for categorical data, values found, are as described in the documentation/metadata.

Consider the compatibility of the data you are integrating

The integration of multiple data sets from different sources requires that they be compatible. Methods used to create the data should be considered early in the process, to avoid problems later during attempts to integrate data sets. Note that just because data can be integrated does not necessarily mean that they should be, or that the final product can meet the needs of the study. Where possible, clearly state situations or conditions where it is and is not appropriate to use your data,...

Create a data dictionary

A data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. A data dictionary provides a concise guide to understanding and using the data.

Create and document a data backup policy

A backup policy helps manage users' expectations and provides specific guidance on the "who, what, when, and how" of the data backup and restore process. There are several benefits to documenting your data backup policy:

  • Helps clarify the policies, procedures, and responsibilities
  • Allows you to dictate:
    • where backups are located
    • who can access backups and how they can be contacted
    • how often data should be backed up...