All Best Practices
Data tables should ideally include values that were acquired in a consistent fashion. However, sometimes instruments fail and gaps appear in the records. For example, a data table representing a series of temperature measurements collected over time from a single sensor may include gaps due to power loss, sensor drift, or other factors. In such cases, it is important to document that a particular record was missing and replaced with an estimated or gap-filled value.
Choose the right data type and precision for data in each column. As examples: (1) use date fields for dates; and (2) use numerical fields with decimal places precision. Comments and explanations should not be included in a column that is meant to include numeric values only. Comments should be included in a separate column that is designed for text. This allows users to take advantage of specialized search and computing functionality and improves data quality. If a particular spreadsheet or...
As part of any review or quality assurance of data, potential problems can be categorized systematically. For example data can be labeled as 0 for unexamined, -1 for potential problems and 1 for "good data." Some research communities have developed standard protocols; check with others in your discipline to determine if standards for data flagging already exist.
The marine community has many examples of quality control flags that can be found on the web. There does not yet seem to be...
A Data Management Plan should include the following information:
- Types of data to be produced and their volume
- Who will produce the data
- Standards that will be applied
- File formats and organization, parameter names and units, spatial and temporal resolution, metadata content, etc.
- Methods for preserving the data and maintaining data integrity
- What hardware / software resources are required to store the...
Multimedia data present unique challenges for data discovery, accessibility, and metadata formatting and should be thoughtfully managed. Researchers should establish their own requirements for management of multimedia during and after a research project using the following guidelines. Multimedia data includes still images, moving images, and sound. The Library of Congress has a set of web pages discussing many of the issues to be considered when creating and working with multimedia data....
In order to preserve the raw data for future use:
- Do not make any changes / corrections to the original raw data file
- Use a scripted language (e.g., R) or a software language that can be documented (eg., C, Java, Python, etc.) to perform analysis or make corrections and save that information in a separate file
- The code, along with appropriate documentation will be a record of the changes
- The code can be modified and rerun, using the raw data...
For appropriate attribution and provenance of a dataset, the following information should be included in the data documentation or the companion metadata file:
- Name the people responsible for the dataset throughout the lifetime of the dataset, including for each person:
- Contact information
- Role (e.g., principal investigator, technician, data manager)
According to the International Polar Year Data and Information Service,...
As a best practice, one must first acknowledge that the process of managing data will incur costs. Researchers should plan to address these costs and the allocation of resources in the early planning phases of the project. This best practice focuses on data management costs during the life cycle of the project, and does not aim to address costs of data beyond the end of the project.
Budgeting and costing for your project is dependent upon institutional resources, services, and...
People have different perspectives on what data means to them, and how it can be used and interpreted in different contexts. Data users ranging from community participants to researchers in different domains can provide unique and valuable insights into data through the use of annotation and tagging. The community-generated notes and tags should be discoverable through the data search engine to enhance discovery and use.
When providing capabilities for community tagging and...
In order to ensure replicable data access:
- Choose a broadly utilized Data Identification Standard based on specific user community practices or preferences
- Consistently apply the standard
- Maintain the linkage
- Participate in implementing infrastructure for consistent access to the resources referenced by the Identifier...
Provide versions of data products with defined identifiers to enable discovery and use
Items to consider when versioning data products:
- Develop definition of what constitutes a new version of the data, for example:
- New processing algorithms
- Additions or removal of data points
- Time or date range
- Included parameters
- Data format
- Immutability of versions
- Develop standard naming convention for...
When creating the data management plan, review all who may have a stake in the data so future users of the data can easily track who may need to give permission. Possible stakeholders include but are not limited to:
- Funding body
- Host institution for the project
- Home institution of contributing researchers
- Repository where the data are deposited
It is considered a matter of professional ethics to acknowledge the work of other scientists and provide...
The plan will be created at the conceptual stage of the project. It should be considered a living document and a road map for the project, and should be closely followed. Any changes to the data management plan should be made deliberately, and the plan should be updated throughout the data life cycle.
Data management planning provides crucial guidance to all stages of the data life cycle. It provides continuity for operations within the research group. The data management plan will...
A separate column should be used for data qualifiers, descriptions, and flags, otherwise there is the potential for problems to develop during analyses. Potential entries in the descriptor column:
- Potential sources of error
- Missing value justification (e.g. sensor off line, human error, data rejected outside of range, data not recorded
- Flags for values outside of expected range, questionable etc.
All research requires the sharing of information and data. The general philosophy is that data are freely and openly shared. However, funding organizations and institutions may require that their investigators cite the impact of their work, including shared data. By creating a usage rights statement and including it in data documentation, users of your data will be clear what the conditions of use are, and how to acknowledge the data source.
Include a statement describing the "usage...