I want to search


Exploration of Search Logs, Metadata Quality and Data Discovery

Ed Flathers
Ed Flathers

Edward is a PhD candidate in the College of Natural Resources at the University of Idaho. His research focus is on the design of ecoinformatics software systems to support collaboration and open science, data sharing and re-use, and practices of reproducible research. His background includes a master's degree in statistical sciences and many years of experience as a software developer. When he has the opportunity, Edward enjoys travel, especially when it includes fishing or hiking.

Project Description: 

The ability to successfully search for and discover data held within a given repository is related to both the capabilities of the search engine and also the quality of the metadata describing the data set. Extensive variation exists in the amount and quality of metadata that an author might provide for a shared data set. At a minimum, contributors might provide details listing the data authors, the study system, and the date, time and place of data collection. A rich metadata record would provide everything that would be required to download and synthesize or reanalyze the data, including a comprehensive abstract describing the work, details of the structure and contents of the data, and full methodological information to properly interpret data values. It is on these metadata terms that the search query operates.

This project will be three-fold.

  • First, the project will examine trends in DataONE search logs from 2012 to 2017 to explore patterns. What search terms were used? How much metadata was queried (how many search terms were used)? How many results were returned for various searches? Did a search result in data download from member repositories? If so, a single repository or multiple?
  • Second, taking advantage of the Quality Report feature available in one of the Member Nodes (the Arctic Data Center), the project will explore the metadata characteristics of downloaded data in comparison with the population of data available within that repository.
  • Finally, in an attempt to directly test the importance of metadata quality a series of standardized searches will be conducted against a replica Arctic Data Center catalog. These metadata will then be manipulated before repeating the search queries to evaluate the impact on discovery and its relationship with the metadata quality reports.
  • The results of the project will be prepared for presentation and/or publication.

Primary Mentor: 
Lauren Walker and Amber Budden
Secondary Mentor: 
Jesse Goldstein, Jeanette Clark, and Matt Jones