To make your data available using standard and open software tools you should:
- Use standard language and terms to clearly communicate to others that your data are available for reuse and that you expect ethical and appropriate use of your data
- Use an open source datacasting (RSS or other type) service that enables you to advertise your data and the options for others to obtain access to it (RSS, GeoRSS, DatacastingRSS)
File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.
When choosing a file name, check for any database management limitations on file name length and use of special characters. Also, in general, lower-case names are less software and platform dependent. Avoid using...
To avoid accidental loss of data you should:
- Backup your data at regular frequencies
- When you complete your data collection activity
- After you make edits to your data
- Streaming data should be backed up at regularly scheduled points in the collection process
- High-value data should be backed up daily or more often
- Automation simplifies frequent backups
- Backup strategies (e.g., full, incremental, differential,...
Terms and phrases that are used to represent categorical data values or for creating content in metadata records should reflect appropriate and accepted vocabularies in your community or institution. Methods used to identify and select the proper terminology include:
- Identify the relevant descriptive terms used as categorical values in your community prior to start of the project (ex: standard terms describing soil horizons, plant taxonomy, sampling methodology or equipment, etc...
Information about quality control and quality assurance are important components of the metadata:
- Qualify (flag) data that have been identified as questionable by including a flagging_column next to the column of data values. The two columns should be properly associated through a naming convention such as Temperature, flag_Temperature.
- Describe the qality control methods applied and their assumptions in the metadata. Describe any software used when performing the...
To assure that metadata correctly describes what is actually in a data file, visual inspection or analysis should be done by someone not otherwise familiar with the data and its format. This will assure that the metadata is sufficient to describe the data. For example, statistical software can be used to summarize data contents to make sure that data types, ranges and, for categorical data, values found, are as described in the documentation/metadata.
The integration of multiple data sets from different sources requires that they be compatible. Methods used to create the data should be considered early in the process, to avoid problems later during attempts to integrate data sets. Note that just because data can be integrated does not necessarily mean that they should be, or that the final product can meet the needs of the study. Where possible, clearly state situations or conditions where it is and is not appropriate to use your data,...
A data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. A data dictionary provides a concise guide to understanding and using the data.
A backup policy helps manage users' expectations and provides specific guidance on the "who, what, when, and how" of the data backup and restore process. There are several benefits to documenting your data backup policy:
- Helps clarify the policies, procedures, and responsibilities
- Allows you to dictate:
- where backups are located
- who can access backups and how they can be contacted
- how often data should be backed up...
- where backups are located
Data files should be managed to avoid disorder. To facilitate access to files, all storage devices, locations and access accounts should be documented and accessible to team members. Use appropriate tools, such as version control tools, to keep track of the history of the data files. This will help with maintaining files in different locations, such as at multiple off-site backup locations or servers.
Data sets that result in many files structured in a file directory can be...