Connect

Data Integration and Semantics

The Data Integration and Semantics Working Group conducts research into the specification, adoption, and implementation of semantic technologies that will enable DataONE to achieve its data discovery and integration objectives. In particular, the Working Group aims to remove barriers to data interoperability, with particular emphasis on the kinds of trans-disciplinary science necessary for addressing critical science issues. The activities of the Working Group include: developing and documenting use cases; evaluating standards, tools, and technologies; developing a roadmap for research and implementation; facilitating collaboration with other relevant research projects and DataONE working groups; and prototyping technology in collaboration with the DataONE Core Cyberinfrastructure Team (CCIT).

WG Charter (pdf)

Working Group Leaders

  • Deborah McGuinness, Rensselaer Polytechnic Institute
  • Mark Schildhauer, National Center for Ecological Analysis and Synthesis

Working Group Members

  • Luis Bermudez, Open Geospatial Consortium
  • Damian Gessler, iPlant
  • Carl Lagoze, Cornell University
  • Hilmar Lapp, National Evolutionary Synthesis Center (NESCent)
  • Natasha Noy, Stanford University
  • Margaret O'Brien, Santa Barbara Coastal LTER
  • Line Pouchard, Oak Ridge National Laboratory
  • Paulo Pinheiro da Silva, Pacific Northwest Laboratory
  • Patrice Seyed, Rensselaer Polytechnic Institute (DataONE postdoc)

Past Working Group Members

  • Jeff Horsburgh, Utah State University
  • Carlos Rueda, Marine Metadata Initiative

Current Projects

Adding Semantic Search to ONEMercury: Working group members are currently working to add semantic search capabilities to the ONEMercury Search Interface, DataONE’s primary online data discovery portal. Mercury is a federated metadata system that was developed at the Oak Ridge National Laboratory Distributed Data Archive Center (ORNL DAAC), and ONEMercury is an adaptation of Mercury to meet the needs of DataONE. With the breadth of scientific data that will eventually be housed within DataONE, pinpointing datasets relevant to a particular query with high recall and specificity will be challenging. Working group members are employing an ontology-based solution that improves Mercury’s faceted search and indicates a path forward for linking data from environmental, ecological,
and biological sciences.

Interdisciplinary Use Cases: Working group members are also building interdisciplinary use cases that demonstrate how semantic technologies can enhance data discovery and integration. One of the focus areas is the use of semantic technologies in discovering, integrating, and using datasets related to the water environment.

Previous Projects

Aida Gandara, a graduate student at University of Texas at El Paso, worked with Hilmar Lapp on a summer internship project, titled LOD4DataONE, focused on integrating loosely structured data into the Linked Open Data cloud. Aida developed an exploratory prototype, and practical recommendations resulting from it, for how the heterogeneous and loosely structured data held in non-specialized DataONE member nodes can be exposed to the Linked (Open) Data cloud. Results of Aida’s work can be found at the open notebook she kept for the project at https://notebooks.dataone.org/lod4dataone/.

Project Outcomes

  • Pouchard, L.C., R.B. Cook, J. Green, G. Palanisamy, N. Noy (2011), Semantic technologies improving recall and precision of the Mercury search engine, Abstract IN31B-1437 presented at 2011 Fall Meeting, AGU, San Francisco, CA, 5-9 Dec.
  • Gandara, A. (2011), LOD4DataONE – Integrating loosely structured data into the Linked Open Data cloud, Results of DataONE Summer Student Internship, https://notebooks.dataone.org/lod4dataone/.