Integrating Loosely Structured Data Into the Linked Open Data Cloud

Aida Gandara
Aida Gandara is a PhD student in the Computer Science Department at the University of Texas at El Paso. She is a Cyber-ShARE research student working on her last year of dissertation research. Her research is focused on collaborative scientific systems where she is focused on helping scientific teams describe and discuss their research in order to share it over the Semantic Web.

Project Description: 

The Linked Data conventions describe four principles that allow data of any kind and from any online source to form a global interconnected web of data: i) name every "thing" that has some data or information associated with it; ii) use HTTP URIs to do so; iii) provide useful information or data in Resource Description Framework (RDF) format to someone looking up such URIs; and iv) within information provided this way, link to other common "things", such as points or axes of reference, and use common vocabularies to attach meaning to links wherever possible. The idea of this project is to develop an exploratory prototype, and practical recommendations resulting from it, for how the heterogeneous and loosely structured data held in non-specialized DataONE member nodes can be exposed to the Linked (Open) Data cloud. The approach would consist of obtaining a sufficiently representative sample of data sets from DataONE's initial 3 member nodes (Dryad, KNB, and ORNL-DAAC), and using them as instance data for which to define the RDF predicate vocabularies, domain ontologies, resource URIs, and conversion mechanisms that are necessary to create a LOD representation of these data. This representation can then be uploaded to, navigated, and queried in either one of the web-based LOD browsers (such as URIburner), or for example in a local installation of OpenLink Virtuoso.

Primary Mentor: 
Hilmar Lapp