I want to search


Next Generation Data Environment: Semantically-Enabling the DataONE Metadata Environment

Kate Chastain
Kate Chastain

Katie just finished her first year in the Information Technology and Web
Science Masters program at Rensselaer Polytechnic in upstate New York.
Her upbringing in northern Virginia, just outside of Washington, DC,
meant that she had seen snow prior to this winter, but New York was still
quite a change weather-wise from where she completed her undergraduate
work, in St. Petersburg, Florida. There, at Eckerd College, she earned a
degree in Computer Science, as well as in East Asian Studies, and Modern
Languages. This summer, she is working on additional features for the
Next Generation Data Environment, a browser-based tool for converting
comma-separated data tables into linked data format. In her free time,
she enjoys good science fiction; bad puns; watching hockey and baseball;
and embarking on cooking adventures.

Project Description: 

The summer intern will work to develop a web-based interface that immediately facilitates any user to semantically enable data and meta-data. Within the application workflow, the user will be able to link their data to selections of ontology concepts from established community ontologies, like OBO-E, leveraging backend vocabulary services developed by Patrice Seyed (post-doc for DataONE semantics and interoperability working group). The interface will leverage formal reasoning to assist a user in making selections constrained by their previous selections of classes and properties, based on how these objects are defined in their respective ontologies, while at the same time assist the user in verifying the set of inferences that follow from all selections. Within the design, the user will be enabled to identify implicit domain entities (e.g., when a measurement data record refers to multiple samples or organisms as opposed to one), useful in scenarios where this is only clearly understood by the data table creator, and flexibly encode their representation within the transformed data. The project serves as an extension to previous semantic data enablement projects across Rensselaer Polytechnic Institute (RPI) and the National Center for Ecological Analysis and Synthesis (NCEAS), including the CSV2RDF4LOD from RPI, that converts tabular data into RDF statement based on user-provided configurations, and Morpho of NCEAS’s Semtools project that annotates tabular data applying the OBO-E ontology model of scientific observation. Researchers involved in these projects are mentors for this proposal and available for guidance. The resulting transformed data will include linkages back to the original data and its source using provenance-centric ontologies (PROV-O, PML3), and will be available for discovery and granular search of datasets described through DataONE’s metadata environment.

Primary Mentor: 
Patrice Seyed, Deborah McGuiness