All Best Practices
The parameters reported in the data set need to have names that clearly describe the contents. Ideally, the names should be standardized across files, data sets, and projects, in order that others can readily use the information.
The documentation should contain a full description of the parameter, including the parameter name, how it was measured, the units, and the abbreviation used in the data file.
A missing value code should also be defined. Use the same notation for...
Spatial coordinates should be reported in decimal degrees format to at least 4 (preferably 5 or 6) significant digits past the decimal point. An accuracy of 1.11 meters at the equator is represented by +/- 0.00001. This does not include uncertainty introduced by a GPS instrument.
Provide latitude and longitude with south latitude and west longitude recorded as negative values, e.g., 80 30' 00" W longitude is -80.5000.
Make sure that all location information in a file uses the...
For date, always include four digit year and use numbers for months. For example, the date format yyyy-mm-dd would appear as 2011-03-15 (March 15, 2011).
If Julian day is used, make sure the year field is also supplied. For example, mmm.yyyy would appear as 122.2011, where mmm is the Julian day.
If the date is not completely known (e.g. day not known) separate the columns into parts that do exist (e.g. separate column for year and month). Don't introduce a day because the...
Data measurement descriptions should:
- Describe data collection methods or protocols(can include diagrams, images, schematics, etc.)
- How the data were collected
- Measurement frequency and regularity
- Describe instrumentation
- Include manufacturer, model number, dates in use
- Maintenance/repair history
- Malfunction history
- Calibration methods, scale, detection limits, and history
When describing the process for creating derived data products, the following information should be included in the data documentation or the companion metadata file:
- Description of primary input data and derived data
- Why processing is required
- Data processing steps and assumptions
- Assumptions about primary input data
- Additional input data requirements
- Processing algorithm (e.g., volts to mol fraction, averaging)
A description of the contents of the data file should contain the following:
- Define the parameters and the units on the parameter
- Explain the formats for dates, time, geographic coordinates, and other parameters
- Define any coded values
- Describe quality flags or qualifying values
- Define missing values
Data sets or collections are often composed of multiple files that are related. Files may have come from (or still be stored in) a relational database, and the relationships among the data tables or other entities are important if the data are to be reused. These relationships should be documented for a repository.
Describe the overall organization of your data set or collection. Often, a data set or collection contains a large number of files, perhaps organized into a number of...
The research project description should contain the following information:
- Who: project personnel (principal investigator, researchers, technicians, others)
- Where: location and description of study site or sites
- When: range of dates for the project
- Why: rational for the project (abstract)
- How: description of project methods
Other useful information might include the project title, the overarching project (if any), institution(s)...
If your project uses a sensor network, you should describe and document that network and the instruments it uses. This information is essential to understanding and interpreting the data you use, and should be included as a part of the metadata generated for your project's data.
- Describe the basic set-up of the sensor network installation, including such details as mount, power source, enclosures, wiring protection, etc.
- Describe instrumentation, cameras and samplers (...
The spatial extent of your data set or collection as a whole should be described. The minimum acceptable description would be a bounding box describing the northern most, southern most, western most, and eastern most limits of the data.
- If the entire collection is from a single location, use the same values for northerly/southerly limits and easterly/westerly values.
- Be sure to specify in the metadata what units you choose to describe your spatial extent.
The temporal extent over which the data within your dataset or collection was acquired or collected should be described. Normally this is done by providing
- the earliest date of data acquisition
- the date that the last data in the collection was acquired
Year, month, day, and time should be included in the description. If data collection is still ongoing, the end date can be omitted, though some statement about this should be placed in the dataset abstract...
The units of reported parameters need to be explicitly stated in the data file and in the documentation. We recommend SI units (The International System of Units) but recognize that each discipline has its own commonly used units of measure. The critical aspect here is that the units be defined so that others understand what is reported.
Do not use abbreviations when describing the units. For example the units for respiration are moles of carbon dioxide per meter squared per year....
Just as data checking and review are important components of data management, so is the step of documenting how these tasks were accomplished. Creating a plan for how to review the data before it is collected or compiled allows a researcher to think systematically about the kinds of errors, conflicts, and other data problems they are likely to encounter in a given data set. When associated with the resulting data and metadata, these documented quality control procedures help provide a...
File formats are important for understanding how data can be used and possibly integrated. The following issues need to be documented:
- Does the file format of the data adhere to one or more standards?
- Is that file standard an open (i.e. open source) or closed (i.e. proprietary) format?
- Is a particular software package required to read and work with the data file? If so, the software package, version, and operating system platform should be cited in the metadata...
Different types of new data may be created in the course of a project, for instance visualizations, plots, statistical outputs, a new dataset created by integrating multiple datasets, etc. Whenever possible, document your workflow (the process used to clean, analyze and visualize data) noting what data products are created at each step. Depending on the nature of the project, this might be as a computer script, or it may be notes in a text file documenting the process you used (i.e....