I want to search

MENU

Laura: Mid-career oceanographer

Background

Photo credit: http://bit.ly/1V9Cm2Z
(CC BY 2.0).
Picture is of Stephanie Mendes, University of California at Santa Barbara.
The person represented here is not affiliated with DataONE and use of their image does not reflect endorsement of DataONE services.

Name, age, and education: 

Laura is a tenured associate scientist at Woods Hole Oceanographic Institution. She received a PhD in marine biology from University of Rhode Island in 1990.

Life or career goals, fears, hopes, and attitudes: 

Laura’s research investigates the effect of climate change on marine food webs, for which she needs to correlate environmental and species data. She is driven by intellectual curiosity and the need to publish and to obtain grant funding to support her research.

A day in the life: 

Laura has been funded by NOAA to do a series of cruises in the Gulf of Mexico, for which she goes to sea several times per year. She collects field data about the biology and chemistry of the water. During her cruises, she collects data on temperature, salinity, irradiance and fluorescence using instruments on board. Her research team collects water samples for later analysis in the laboratory (nitrate, nitrite, phosphate, ammonium, silicate and plankton counts, molecular). Some of these data are entered directly into a spreadsheet, but some are recorded onto printed data sheets. Much of her data comes off of an instrument and must be transformed to be useful. Each instrument and analysis has its own limits of detection and precision that must be taken into account.

One of the most time consuming aspects of her analysis is the plankton counts. Using a microscope she must identify and quantify plankton species. Plankton identification can be very tedious and there is a steep learning curve. Right now she uses a collection of books to make her identifications, but many of her taxonomic categories are used only within her lab, making it difficult to share these data broadly. These data are recorded onto a data sheet and must be transformed to be useful.

Laura’s lab uses the Federal Geographic Data Committee (FGDC) metadata guidelines and the MERMAid tool for metadata management.

Reasons for using DataONE to share and to reuse data
Needs and expectations of DataONE tools: 

NOAA requires that Laura upload her data into the NODC. She also uploads her data to the WHOI data store. She will send out data in spreadsheets to other researchers if asked and if she has the time to put them together. In some cases, such sharing has lead to interesting collaborations and she would be open to such opportunities in the future. Laura would gladly pay her data manager to upload her data into additional repositories as long as she could keep track of usage and gain citations. She would appreciate a chance to review and comment on the results of such use before publication, to receive copies of published articles and to ensure that her funding agency is acknowledged. It would be better if DataONE could streamline the process of sharing data. Finally, she is aware that other researchers have created derivative datasets based on her data (among others). While she believes this is an appropriate use of her data, she would like some way to assess the impact of this contribution and to receive credit for her work.

Laura would like to compare her field data with comparable field data and to historical data to identify trends, but she does not routinely download data from NODC because of poor usability. She would like to be able to go to one place, download all phosphate measurements made in the Gulf of Mexico (for example) and receive those data in a file, formatted to her specifications.

Intellectual and physical skills that can be applied: 

Laura is already adept at handling large datasets, including those handling abstract data fields which cannot be easily managed by hand. The process of compiling and analyzing her data can be complex; thus, she already understands that a crucial reporting requirement for sharing data is to also share the specific methods and subsequent data management steps taken to render the data amenable to analysis. She has first-hand experience with data of poor usability and she is keen to provide her data to a system and in a manner that does not suffer these same problems.

Technical support available: 

Laura employs a science diver who also acts as a data manager. WHOI also provides some data support. Her cruises are instrument and data heavy. Given the quantities of data involved, she must be able to utilize protocols which significantly automate the process of data deposition into DataONE affiliated repositories or else it will be too much work for her to contemplate.

Personal biases about data sharing and reuse (and data management more generally): 

Laura accepts and takes seriously the obligation to upload her data. She needs a wide variety of data and believes that sharing data is important to be able to create a holistic view of the ecosystems she studies. She also believes that many of the data she needs for her own research likely already exist, albeit in inaccessible or unhelpful form. Part of her motivation to share data is to illustrate best practices and to encourage more people to develop tools and processes which might help in her own research.

Comparison of current and DataONE-enabled practices:
Project Planning: 
  • Management Planning: Develops a project Data Management Plan following examples provided on the DataONE portal.
Current data collection: 
  • Field data.
  • Digital and analog.
DataONE enabled data collection: 

No change.

Current data assurance: 
  • Undertakes basic validation steps as preparation for using her own statistical software.
  • Must transform much of her data to analyze and interpret it.
  • Her data manager handles most of these workflows. If they encounter errors in the batch sampling files, they have to discard all of those data since it might be reflective of instrument error.
DataONE enabled assurance: 

If DataONE tools can perform some of these validation steps more easily than is possible through the programs Laura currently uses, she would be willing to switch her workflow to managing the process using these tools. Absent that improvement, she might also be willing to use a workflow-management tool from DataONE, using a DataONE template for assuring that she has taken care of all the necessary steps for data deposition (including description and deposit), even though most of the actual work at each step is taking place outside of DataONE. Note that this possibility is especially attractive if adherence to the DataONE template guarantees that she will be able to seamlessly deposit her data into multiple member repositories with no additional requirements.

Current data description: 
  • Laura uses some field-supported standardized metadata schemas already.
  • Laura also adds some custom fields for internal purposes, though she thinks these fields are likely to be useful to others as well.
  • Other data description is peculiar to her lab, which hampers sharing data.
DataONE enabled description: 

As with data assurance, if DataONE provides tools for facilitating data description, including template support for standardized and customized metadata schemas, then she might migrate this part of the workflow to DataONE. Otherwise, this part of her workflow is not likely to be significantly impacted by DataONE.

Current data preservation: 
  • Laura and her data manager currently have to exert some effort to upload her data into existing repositories for her discipline. She feels strongly that this is an important activity and will continue to exert this effort regardless.
  • Data Preservation: Collects data during summer research season and deposits the data in a data repository (a DataONE Member Node).
  • She is interested in giving greater access to her data but does not know where else to put the data and cannot devote even more time to further data deposition.
  • Laura presumes that her current activities are sufficient for long-term preservation, perhaps even ideal under current options.
DataONE enabled preservation: 
  • If DataONE can provide a single point of deposit for subsequent republication of the data into multiple data repositories, that would be a huge attraction for Laura. Note that this can occur via deposition directly into a member node, but then the data might flow through DataONE from that member node to other nodes. Alternatively, DataONE might provide a template or suite of tools for assurance/description which is guaranteed to streamline deposition of the data into any member nodes, which would also be very attractive for Laura. At a minimum, DataONE can provide guidance for choosing an appropriate repository.
  • Data Preservation: Deposits the data and metadata in the LTER data repository.
    Laura wants to see evidence that her data is being used and is having an impact. If DataONE provides that evidence, Laura would be willing to spend even more time and effort on data deposition.
  • Citation: Another scientist working on a similar study discovers the new publication and data created by Laura and cites her in his work.
    Laura currently feels that her data management processes are sufficient for long-term preservation, but if DataONE enhances this offering in some way, that is of interest and could be a motivating factor.
  • Data Preservation: Submits a research paper to an ecological journal associated with Dryad—a DataONE Member Node. Upon acceptance, she submits the publication-relevant data, metadata, and model to Dryad where they are given a DOI (digital object identifier) and preserved in the Dryad repository.
Current data discovery: 
  • Laura is already interested in other datasets, both to enhance her own research (i.e., put it into context) as well as to identify possible collaborative opportunities. But she has had limited to no success to date identifying collaborators via data of shared interest, nor in identifying other useful datasets. It is her sense that new tools are needed, and that existing repositories are not user-friendly enough to be worth her time.
  • Laura will occasionally identify possible collaborators based on paper she reads or talks she attends, but she has not found any good way to find their existing data to ascertain the potential for collaboration, and she is reluctant to spend the time contacting these people and building trust when she is not sure if there is any real collaborative opportunity.
DataONE enabled discovery: 
  • If DataONE makes it easy to discover datasets of interest, Laura is likely to become a power user
  • Laura is likely to search for datasets by authorship, paper-association, etc. Easy faceted search capability is going to be a key feature of DataONE.
  • If the system can use Laura's own data as a point of reference for making automated recommendations of possibly related datasets, that would complete the data cycle for Laura and drive her interest in getting her data into the system.
  • Data Discovery, Access, Use and Dissemination: Acquires supplemental data from another DataONE Member Node with complete citation information.
  • Data Discovery, Access, Use and Dissemination: Identifies relevant data and downloads data and metadata from previous LTER studies as well as data collected by state and Federal agency scientists) (i.e., non-LTER).
Current data integration: 

Laura understands the logic of data integration, but she does not currently do any integration with datasets that she did not collect herself.

DataONE enabled integration: 
  • If DataONE provides a toolset for integrating disparate datasets, Laura is likely to become a power user.
  • Another design option for DataONE is to enable data transforms and annotations during the process of discovery so that downloaded data already exhibit some of the necessary characteristics for integration (using some third-party tool).
  • Laura will want to be able to (re)publish the integrated dataset so that others can build on her work and insights. It's not yet clear what attribution expectations she might have for these derivative datasets, nor how much work she would be willing to do to attribute the original data contributors.
Current data analyses: 
  • Laura analyzes all of her existing data using standard statistical packages, using data stored on her own hard drive.
  • Upon completing her analyses, she typically only captures and publishes the statistical results (and relevant summary variables) due to the lack of any appropriate place to publish more information or visuals.
  • In papers and publications, Laura typically provides a link to the entire dataset, if it has been deposited into a repository at that point. She does not provide (and does not know how to provide) direct links to the specific subsets of the data relevant to any analysis.
DataONE enabled analysis: 
  • If DataONE provides robust analytical tools, there is a good chance that Laura will leverage those offerings. However, Laura is more likely to export any data of interest to perform the analyses using her current methods and tools.
  • She would be even more interested in analytics within DataONE if the DataONE system can track and record the specific methods of analysis, enabling direct reference to the subset of the data used for each analysis and also enabling anyone to reconstitute the analysis (step by step, ideally) to examine the assumptions, transformations, and other aspects of her thinking. This functionality could be provided as a link to an executable script which also serves as a sort of surrogate for attribution.
  • Data Visualization: Creates graphics using tools identified via DataONE.
  • Data Visualization: Uses data analysis and visualization tools identified through the DataONE Tools Database or available as part of the Investigator Toolkit to analyze existing data and develop initial model parameters that she will use in her own research.
Source: 

Data Conservancy Laura persona by Anne Thessen; revised by Kevin Crowston