Angela Murillo is a second-year doctoral student at the School of Information and Library Science at the University of North Carolina at Chapel Hill. She received her bachelor’s degrees in Geosciences, English, and Spanish and her MLIS from the University of Iowa. Angela was the Project Manager for the DigCCurr (http://ils.unc.edu/digccurr/index.html) Project and was also an Earth Science Information Partners (http://www.esipfed.org/) student fellow. Her research interest include scientific data management, scientific data sharing and reuse, and metadata and interoperability.
Primary Mentor: Jane Greenberg
Secondary Mentor: John Kunze, Joan Boone
Secondary Mentor: Heather Piwowar
Secondary Mentor: Giri Palanisamy, Natasha Noy, Jeff Horsburgh
Secondary Mentor: Andrew Sallans, Sherry Lake
Karina Kervin is a PhD student at the School of Information at University of Michigan. Her research interest is understanding scientific data practices in order to apply human computer interaction concepts to scientific information visualization and data management tools. She graduated from University of Kansas in 2009 with her Bachelors of Science in Computer Science.
As many at DataONE are aware, the increasing volume of data and complex global problems require increased data sharing. Unfortunately, there is little guidance on how to share data effectively. Based on this, the goal of the initial stage of this project is to identify some of the most common mistakes researchers make when preparing data for publication. Over the past week, I’ve brought myself up to speed on current suggestions for ecological metadata best practices. Then, I started reading reviewer comments on accepted data papers to see what problems the reviewers were identifying as potential problems with explanations of the data (metadata). Some of the overarching problems identified in this first pass are inadequate descriptions of the geospatial locations sampled and unclear methodology.
Primary Mentor: Bill Michener
Secondary Mentor: Bob Cook
Mr. Song is a PhD student in Computer Science at University of California, Davis. He received his BS in Applied Biological Science and Enterprise Management at Zhejiang University. His research interests include graph theory and algorithm design, database query optimization, large data analysis and integration.
Scientific workflow systems are used to compose and automate complex computational pipelines from pre-existing software components. An important feature of scientific workflow systems is their ability to record provenance information. Provenance includes the processing history and lineage of data, and can be used, e.g., to validate data products, debug workflows, document authorship and attribution chains, etc., and thus facilitate “reproducible science”. The DataONE Working Group on Provenance (ProvWG) has developed a provenance model D-OPM (DataONE-OPM) for scientific workflows, based on the general purpose Open Provenance Model (OPM), and extended with workflow specific features. The goal of this year's summer project is to implement a special-purpose query language for provenance and workflow graphs, based on prior work by the mentors and state-of-the-art languages and techniques known from graph-based and declarative query languages. In particular, the system will allow the user to express a provenance query as a path expression or "graph pattern", which is then translated to a lower-level representation, which in turn is executed on an existing database engine. The resulting prototype will form a starting point for the DataONE cyberinfrastructure to support provenance analytics
Primary Mentor: Bertram Ludäscher
Secondary Mentor: Paolo Missier, Shawn Bowers