I want to search

MENU

The Long and Winding Road to Public Data

The Long and Winding Road to Public Data

Dr. Watson was accustomed to seeing dead things. As a wildlife ecologist, she had made a career out of investigating animals and their untimely demise under the rumbling engines of motor vehicles. Animal road mortalities had a reputation for being difficult to track because the majority of incidents went unreported, so Watson had to come up with creative alternatives for obtaining data. To answer the questions she was most interested in, Watson needed to know exactly where these accidents were happening. The limited data she could access through the police department, however, rarely contained this level of detail. Instead, she had learned to make use of contacts at local agencies and share data back and forth as needed. Watson had developed a reputation for being a friendly, trustworthy collaborator, and her colleagues were happy to help where they could.

Manuel, a graduate student at the university where Watson taught, also had a fascination with the ecological impact of road traffic. Manuel was planning a thesis on road mortality patterns in deer, and was anxious to find more data… but he was stuck. He had neither the resources for collecting new data nor the connections or institutional know-how to locate existing sources. He asked Watson if she knew of any datasets from New England that might contain the data he needed. Though she listened patiently to Manuel and sympathized with his dilemma, Watson informed him that the data simply did not exist.

But Manuel was not to be put off. In his native country, data on animal collision fatalities were plentiful – drivers were required by law to report all accidents involving wildlife. Each year, the body of data grew (especially during times of increased animal activity, like mating seasons). Surely someone, somewhere in New England, recognized the value of that data and was maintaining the stats he needed for his work? If so, couldn’t Dr. Watson help locate it? After all, what did Dr. Watson have to lose?

Seeing that Manuel would not be deterred, Dr. Watson agreed to ask around and determine if such a dataset could possibly exist… but she was more than skeptical. Deer were frequently the victims of car accidents all over the northeast; their grisly remains littered roadsides across the region, and it was highly unlikely that anyone was keeping tally on what was merely an unfortunate fact of life. “I’ll make some inquiries,” Watson promised, “but let’s not get our hopes up.” Manuel nodded and made his exit, though Watson was sure she would hear from him again soon.

Who would she turn to for help? Dr. Watson had of course made many contacts over the years, so she had a few candidates in mind. She would email those that might have an interest in or know of such a dataset (if it even existed). They would also have to be well-connected within their various agencies; ten years on the job had taught her that many of the state organizations operated as silos, wrapped up in their own affairs and cut off from one another’s efforts. She would need to cast a wide net – several, in fact – if she hoped to get a glimpse of this mythological dataset.

After clicking through her email contact list a few times, Dr. Watson concluded her search with a grand total of five names. Four people were managers of their agencies, and the remaining person was a researcher with an outstanding history of cross-collaboration.

Once composed, the message was short, casual, and to-the-point. Watson described what she was looking for and what her graduate student wanted with the data. Her requests were minimal: that they contact her if anything turned up, and that they forward the request on to others in their network. When they arrived, the responses were precisely what Watson had imagined. All were friendly and willing to support the search, but none offered even a glimmer of hope for the increasingly-lost cause.

“I would love to see a dataset like this,” the lone researcher replied, “but I just don’t think anyone is working on it right now.” Each respondent wished her well and promised to write back with any news, but that was all. Even Watson’s friend at the Department of Transportation turned up empty-handed. The man had spent years as the point person for a multi-agency research project on reducing animal-vehicle collisions; if he couldn’t point Watson to the dataset, it simply was not to be found. She had done what she could, but Watson knew it was time to give up the ghost hunt. Manuel would be disappointed, but he would understand.

Watson sat down at his desk to write a consolatory email to Manuel. “Well, we gave it our best shot,” she began weakly. No sooner had she clicked SEND than a message appeared in her inbox – from the size of it, something big. It was from a woman named Charlotte from some obscure division at the Department of Transportation. She had heard of her inquiry from a friend of a colleague of a colleague some ways up the grapevine, and she thought she might be able to help. While they had never met, Charlotte knew of Watson by reputation. What’s more, she was an alumna of the very university where Watson worked!

When she opened the impossibly large attachment, Watson let out a whoop of excitement. It was an Excel spreadsheet with over 27,000 geo-referenced deer road mortalities – the very thing Manuel was looking for.

Watson had gone to the Department of Transportation. She had searched through its online files and poked around its various divisions. All of her searching had convinced her that this dataset, the one sitting in his very inbox, did not and had never existed. So what made this seemingly-doomed expedition an unexpected success? Watson was tempted to attribute her and Manuel’s good fortune to serendipity, but knew that luck alone had not delivered the dataset into their hands. She suspected she could learn a thing or two from Manuel’s dogged optimism.

And yet, the whole situation had a foul air about it which had little to do with deer carcasses. When Watson really thought about the circumstances of the data acquisition, it seemed a little silly that so much serendipity, persistence and luck were needed to unearth The Spreadsheet. After all, shouldn’t all data collected using public funds be openly shared to begin with? Why should such a useful resource be locked away in agency fortresses, waiting for the day that a determined graduate student and his advisor finally sniffed it out of obscurity? Research policies were evolving to embrace public data sharing and open access, but not fast enough for other aspiring researchers like Manuel. In the meantime, many of them resigned themselves to more conventional data collection projects after finally accepting that the data they needed did not exist – a hard truth, perhaps, but one that hopefully would not last.


Watch the Film!

Please note: If you are using Chrome and unable to see the embedded video, visit the shield icon in your browser bar and allow access.

DataONE Data Stories: The Long and Winding Road to Public Data from DataONE on Vimeo.

Image: CC-BY-NC-SA by DJOtaku via flickr
Film: Becky Beamer