TitleBodyTechnical Expertise RequiredCostAdditional Information
Backup your data

To avoid accidental loss of data you should:

  • Backup your data at regular frequencies
    • When you complete your data collection activity
    • After you make edits to your data
  • Streaming data should be backed up at regularly scheduled points in the collection process
    • High-value data should be backed up daily or more often
    • Automation simplifies frequent backups
  • Backup strategies (e.g., full, incremental, differential, etc…) should be optimized for the data collection process
  • Create, at a minimum, 2 copies of your data
  • Place one copy at an “off-site” and “trusted” location
    • Commercial storage facility
    • Campus file-server
    • Cloud fire-server (e.g., Amazon S3, Carbonite)
  • Use a reliable device when making backups
    • External USB drive (avoid the use of “light-weight” devices e.g., floppy disks, USB stick-drive; avoid network drives that are intermittently accessible)
    • Managed network drive
    • Managed cloud file-server (e.g., Amazon S3, Carbonite)
  • Ensure backup copies are identical to the original copy
    • Perform differential checks
    • Perform “checksum” check
  • Document all procedures to ensure a successful recovery from a backup copy
Decide what data to preserve

The process of science generates a variety of products that are worthy of preservation. Researchers should consider all elements of the scientific process in deciding what to preserve:

  • Raw data
  • Tables and databases of raw or cleaned observation records and measurements
  • Intermediate products, such as partly summarized or coded data that are the input to the next step in an analysis
  • Documentation of the protocols used
  • Software or algorithms developed to prepare data (cleaning scripts) or perform analyses
  • Results of an analysis, which can themselves be starting points or ingredients in future analyses, e.g. distribution maps, population trends, mean measurements
  • Any data sets obtained from others that were used in data processing
  • Multimedia: documented procedures, or standalone data

When deciding on what data products to preserve, researchers should consider the costs of preserving data:

  • Raw data are usually worth preserving
  • Consider space requirements when deciding on whether to preserve data
  • If data can be easily or automatically re-created from raw data, consider not preserving. E.g. if data that have undergone quality control processes and were analyzed, consider preserving since reproduction might be costly
  • Algorithms and software source code cost very little to preserve
  • Results of analyses may be particularly valuable for future discovery and cost very little to preserve

Researchers should consider the following goals and benefits of preservation:

  • Enabling re-analysis of the same products to determine whether the same conclusions are reached
  • Enabling re-use of the products for new analysis and discovery
  • Enabling restoration of original products in the case that working datasets are lost