|Create, manage, and document your data storage system|
Data files should be managed to avoid disorder. To facilitate access to files, all storage devices, locations and access accounts should be documented and accessible to team members. Use appropriate tools, such as version control tools, to keep track of the history of the data files. This will help with maintaining files in different locations, such as at multiple off-site backup locations or servers.
Data sets that result in many files structured in a file directory can be difficult to decipher. Organize files logically to represent the structure of the research/data. Include human readable "readme" files at critical levels of the directory tree. A "readme" file might include such things as explanations of naming conventions and how the structure of the directory relates to the structure of the data.
|Define expected data outcomes and types|
In the planning process, researchers should carefully consider what data will be produced in the course of their project.
Consider the following:
- What types of data will be collected? E.g. Spatial, temporal, instrument-generated, models, simulations, images, video etc.
- How many data files of each type are likely to be generated during the project? What size will they be?
- For each type of data file, what are the variables that are expected to be included?
- What software programs will be used to generate the data?
- How will the files be organized in a directory structure on a file system or in some other system?
- Will metadata information be stored separately from the data during the project?
- What is the relationship between the different types of data?
- Which of the data products are of primary importance and should be preserved for the long-term, and which are intermediate working versions not of long-term interest?
When preparing a data management plan, defining the types of data that will be generated helps in planning for short-term organization, the analyses to be conducted, and long-term data storage.
|Define roles and assign responsibilities for data management|
In addition to the primary researcher(s), there might be others involved in the research process that take part in aspects of data management. By clearly defining the roles and responsibilities of the parties involved, data are more likely to be available for use by the primary researchers and anyone re-using the data. Roles and responsibilities should be clearly defined, rather than assumed; this is especially important for collaborative projects that involve many researchers, institutions, and/or groups.
Examples of roles in data management:
- data collector
- metadata generator
- data analyzer
- project director
- data model and/or database designer
- computing staff responsible for backup and/or storage
- staff responsible for running instruments
- administrative support staff responsible for grant submission
- specialized skills as defined in the plan (GIS, relational database design/implementation, computer programming of sensors/input forms, etc)
- external data center or archive
Steps for assigning data management responsibilities:
- For each task identified in your data management plan, identify the skills needed to perform the task
- Match skills needed to available staff and identify gaps
- Develop training/hiring plan
- Develop staffing/training budget and incorporate into project budget
- Assign responsible parties and monitor results
|Identify data sensitivity|
Steps for the identification of the sensitivity of data and the determination of the appropriate security or privacy level are:
- Determine if the data has any confidentiality concerns
- Can an unauthorized individual use the information to do limited, serious, or severe harm to individuals, assets or an organization’s operations as a result of data disclosure?
- Would unauthorized disclosure or dissemination of elements of the data violate laws, executive orders, or agency regulations (i.e., HIPPA or Privacy laws)?
- Does the data have any integrity concerns?
- What would be the impact of unauthorized modification or destruction of the data?
- Would it reduce public confidence in the originating organization?
- Would it create confusion or controversy in the user community?
- Could a potentially life-threatening decision be made based on the data or analysis of the data?
- Are there any availability concerns about the data?
- Is the information time-critical? Will another individual or system be relying on the data to make a time-sensitive decision (i.e. sensing data for earthquakes, floods, etc.)?
- Document data concerns identified and determine overall sensitivity (Low, Moderate, High)
- Low criticality would result in a limited adverse effect to an organization as a result of the loss of confidentiality, integrity, or availability of the data. It might mean degradation in mission capability or result in minor harm to individuals.
- Moderate criticality would result in a serious adverse effect to an organization as a result of the loss of confidentiality, integrity, or availability of the data. It might mean a severe degradation or loss of mission capability or result in significant harm to individuals that does not involve loss of life or serious life threatening injuries.
- High criticality would result in a severe or catastrophic adverse effect as a result of the loss of confidentiality, integrity, or availability of the data. It might cause a severe degradation in or loss of mission capability or result in severe or catastrophic harm to individuals involving loss of life or serious life threatening injuries.
- Develop data access and dissemination policies and procedures based on sensitivity of the data and need-to-know.
- Develop data protection policies, procedures and mechanisms based on sensitivity of the data.
|Incentives, Challenges, Barriers: Exploring social, institutional and economic reasons for sharing data|| |