Ecologists, as a group, seek out adventure, natural wonder, and, let’s face it, sometimes hardship. Rather than being deterred by remote locations and inhospitable environments, they are inspired. Speak to an ecologist, and you’re likely to find out very quickly about the particular location in which they conduct their research and hear amazing stories about the difficulties they endure to collect their data.
A relatively more rare breed of ecologist chooses (at least sometimes) to forego the adventure, wonder, and frustration of collecting field data for an entirely different (and likely less obvious) set of challenges and exhilarating successes; these brave and adventurous ecologists choose to work with data that have already been collected by other researchers. Members of a research team studying climate change effects on the lakes of the world are great examples of this type of intrepid ecologist.
Buoyed with optimism and excitement about what they would find when analyzing data from lakes all over the world, the researchers set out to collect as many lake data sets as they could. Because they already knew about some of the challenges involved in working with existing data, they made a plan for how to organize and manage all of the data they were collecting. Each member of the team was in charge of contacting a few other researchers who might have data that could be used for the project. When researchers offered to share their data and a new data set was received, the team used a dedicated DropboxTM folder to share and archive the data.
The project was shaping up quite nicely, with a good number of data sets collected and ready for analysis. They scheduled to give a conference presentation on the project, excited to have the chance to talk about the work with their colleagues, and, as is common, also relying on the conference deadline to motivate them to move forward with the analysis. It was during the last-minute preparation for the conference that the first twinges of anxiety began to surface. The team member who was performing the analyses began to notice that many of the data sets were missing metadata. Metadata, or ‘data about data’, include information about how, by whom, and for what purposes the data were created and to what exactly the data values refer. Without such metadata, it is impossible to understand exactly how a data set can be used. For the analysis of the lake data sets, the researchers needed information about exactly where the lakes are located (e.g., longitude and latitude), and the depth and surface area of each lake from which data were collected.
As anxiety about the missing metadata began to transform to panic under the weight of the presentation deadline, the ecologist contacted his collaborators and requested that they help out with some emergency ‘googling’ to see how much of the metadata they could find online. They were able to compile enough metadata to successfully go through with the presentation, but unfortunately some of the lake data had to be left out of the analysis because they were unable to find all of the information they needed. Although the presentation went well, the team was secretly a little disappointed that they hadn’t been able to include all of the data they had collected. When they returned from the conference, the set to work to contact the data contributors to request the missing metadata so that all of the available data could be included in future analyses.
Discussion points: How did the research team get into this situation? They had been proactive about setting up a procedure and platform for collecting, sharing, and managing the data for the study. How did the absence of crucial metadata slip past them? What could they have done to avoid this oversight?
Story contributed by Dr. Derek Gray with additional information from Kara Woo.
Image: CC-BY-NC-SA by Frank Maurer via flickr