TitleBodyTechnical Expertise RequiredCostAdditional Information
Advertise your data using datacasting tools

To make your data available using standard and open software tools you should:

  • Use standard language and terms to clearly communicate to others that your data are available for reuse and that you expect ethical and appropriate use of your data
  • Use an open source datacasting (RSS or other type) service that enables you to advertise your data and the options for others to obtain access to it (RSS, GeoRSS, DatacastingRSS)
Assign descriptive file names

File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.

When choosing a file name, check for any database management limitations on file name length and use of special characters. Also, in general, lower-case names are less software and platform dependent. Avoid using spaces and special characters in file names, directory paths and field names. Automated processing, URLs and other systems often use spaces and special characters for parsing text string. Instead, consider using underscore ( _ ) or dashes ( - ) to separate meaningful parts of file names. Avoid $ % ^ & # | : and similar.

If versioning is desired a date string within the file name is recommended to indicate the version.

Avoid using file names such as mydata.dat or 1998.dat.

Backup your data

To avoid accidental loss of data you should:

  • Backup your data at regular frequencies
    • When you complete your data collection activity
    • After you make edits to your data
  • Streaming data should be backed up at regularly scheduled points in the collection process
    • High-value data should be backed up daily or more often
    • Automation simplifies frequent backups
  • Backup strategies (e.g., full, incremental, differential, etc…) should be optimized for the data collection process
  • Create, at a minimum, 2 copies of your data
  • Place one copy at an “off-site” and “trusted” location
    • Commercial storage facility
    • Campus file-server
    • Cloud fire-server (e.g., Amazon S3, Carbonite)
  • Use a reliable device when making backups
    • External USB drive (avoid the use of “light-weight” devices e.g., floppy disks, USB stick-drive; avoid network drives that are intermittently accessible)
    • Managed network drive
    • Managed cloud file-server (e.g., Amazon S3, Carbonite)
  • Ensure backup copies are identical to the original copy
    • Perform differential checks
    • Perform “checksum” check
  • Document all procedures to ensure a successful recovery from a backup copy
Create, manage, and document your data storage system

Data files should be managed to avoid disorder. To facilitate access to files, all storage devices, locations and access accounts should be documented and accessible to team members. Use appropriate tools, such as version control tools, to keep track of the history of the data files. This will help with maintaining files in different locations, such as at multiple off-site backup locations or servers.

Data sets that result in many files structured in a file directory can be difficult to decipher. Organize files logically to represent the structure of the research/data. Include human readable "readme" files at critical levels of the directory tree. A "readme" file might include such things as explanations of naming conventions and how the structure of the directory relates to the structure of the data.

Describe format for spatial location

Spatial coordinates should be reported in decimal degrees format to at least 4 (preferably 5 or 6) significant digits past the decimal point. An accuracy of 1.11 meters at the equator is represented by +/- 0.00001. This does not include uncertainty introduced by a GPS instrument.

Provide latitude and longitude with south latitude and west longitude recorded as negative values, e.g., 80 30' 00" W longitude is -80.5000.

Make sure that all location information in a file uses the same coordinate system, including coordinate type, datum, and spheroid. Document all three of these characteristics (e.g., Lat/Long decimal degrees, NAD83 (North American Datum of 1983), WGRS84 (World Geographic Reference System of 1984)). Mixing coordinate systems [e.g., NAD83 and NAD27 (North American Datum of 1927)] will cause errors in any geographic analysis of the data.

If locating field sites is more convenient using the Universal Transverse Mercator (UTM) coordinate system, be sure to record the datum and UTM zone (e.g., NAD83 and Zone 15N), and the easting and northing coordinate pair in meters, to ensure that UTM coordinates can be converted to latitude and longitude.

To assure the quality of the geospatial data, plot the locations on a map and visually check the location.

Describe measurement techniques

Data measurement descriptions should:

  • Describe data collection methods or protocols(can include diagrams, images, schematics, etc.)
    • How the data were collected
    • Measurement frequency and regularity
  • Describe instrumentation
    • Include manufacturer, model number, dates in use
    • Maintenance/repair history
    • Malfunction history
    • Calibration methods, scale, detection limits, and history
  • Document measurement uncertainty, including accuracy, precision, and reproducibility. Provide values in the context of the measurements, e.g., standard error, standard deviation, confidence limits.
Identify data sensitivity

Steps for the identification of the sensitivity of data and the determination of the appropriate security or privacy level are:

  • Determine if the data has any confidentiality concerns
    • Can an unauthorized individual use the information to do limited, serious, or severe harm to individuals, assets or an organization’s operations as a result of data disclosure?
    • Would unauthorized disclosure or dissemination of elements of the data violate laws, executive orders, or agency regulations (i.e., HIPPA or Privacy laws)?
    • Does the data have any integrity concerns?
    • What would be the impact of unauthorized modification or destruction of the data?
    • Would it reduce public confidence in the originating organization?
    • Would it create confusion or controversy in the user community?
    • Could a potentially life-threatening decision be made based on the data or analysis of the data?
    • Are there any availability concerns about the data?
    • Is the information time-critical? Will another individual or system be relying on the data to make a time-sensitive decision (i.e. sensing data for earthquakes, floods, etc.)?
  • Document data concerns identified and determine overall sensitivity (Low, Moderate, High)
    • Low criticality would result in a limited adverse effect to an organization as a result of the loss of confidentiality, integrity, or availability of the data. It might mean degradation in mission capability or result in minor harm to individuals.
    • Moderate criticality would result in a serious adverse effect to an organization as a result of the loss of confidentiality, integrity, or availability of the data. It might mean a severe degradation or loss of mission capability or result in significant harm to individuals that does not involve loss of life or serious life threatening injuries.
    • High criticality would result in a severe or catastrophic adverse effect as a result of the loss of confidentiality, integrity, or availability of the data. It might cause a severe degradation in or loss of mission capability or result in severe or catastrophic harm to individuals involving loss of life or serious life threatening injuries.
  • Develop data access and dissemination policies and procedures based on sensitivity of the data and need-to-know.
  • Develop data protection policies, procedures and mechanisms based on sensitivity of the data.
Incentives, Challenges, Barriers: Exploring social, institutional and economic reasons for sharing data
Use appropriate field delimiters

Delimit the columns within a data table using commas or tabs; these are listed in order of preference. Semicolons are used in many systems as line end delimiters and may cause problems if data are imported into those systems (e.g. SAS, PHP scripts). Avoid delimiters that also occur in the data fields. If this cannot be avoided, enclose data fields that also contain a delimiter in single or double quotes.

An example of a consistently delimited data file with a header row:

Date, Avg Temperature, Precipitation
01Jan2010, 32.3, 0.0
02Jan2010, 34.1, 0.5
03Jan2010, 31.4, 2.5
04Jan2010, 33.2, 0.0