|Ensure basic quality control|
Quality control practices are specific to the type of data being collected, but some generalities exist:
- Data collected by instruments:
- Values recorded by instruments should be checked to ensure they are within the sensible range of the instrument and the property being measured. Example: Concentrations cannot be < 0, and wind speed cannot exceed the maximum speed that the anemometer can record.
- Analytical results:
- Values measured in the laboratory should be checked to ensure that they are within the detection limit of the analytical method and are valid for what is being measured. If values are below the detection limit, they should be properly coded and qualified.
- Any ancillary data used to assess data quality should be described and stored. Example: data used to compare instrument readings against known standards.
- Observations (such as bird counts or plant cover):
- Range checks and comparisons with historic maxima will help identify anomalous values that require further investigation.
- Comparing current and past measurements help identify highly unlikely events. For example, it is unlikely that the girth of a tree will decrease from one year to the next.
Codes should be used to indicate quality of data.
- Codes should be checked against the list of allowed values to validate code entries
- When coded data are digitized, they should be re-checked against the original source. Double data entry, or having another person check and validate the data entered, is a good mechanism for identifying data entry errors.
Dates and times:
- Ensure that dates and times are valid
- Time zones should be clearly indicated (UTC or local)
- Values should be consistent with the data type (integer, character, datetime) of the column in which they are entered. Example: 12-20-2000A should not be entered in a column of dates).
- Use consistent data types in your data files. A database, for instance, will prevent entry of a string into a column identified as having integer data.
- Map coordinates to detect errors
|Mark data with quality control flags|
As part of any review or quality assurance of data, potential problems can be categorized systematically. For example data can be labeled as 0 for unexamined, -1 for potential problems and 1 for "good data." Some research communities have developed standard protocols; check with others in your discipline to determine if standards for data flagging already exist.
The marine community has many examples of quality control flags that can be found on the web. There does not yet seem to be standards across the marine or terrestrial communities.
|Use consistent codes|
Be consistent in the use of codes to indicate categorical variables, for example species names, sites, or land cover types. Codes should always be the same within one data set Pay particular attention to spelling and case; most frequent problems are with abbreviations for species names and sites.
Consistent codes can be achieved most easily by defining standard categorical variables (codes) and using drop down lists (excel, database). Frequently a code is needed for ‘none of the above’ or ‘unknown’ or ‘other’ to avoid imprecise code assignment.