You are here
Provenance as a First-class Citizen in DataONE
Parisa Kianmajd is a PhD student in Computer Science at University of California, Davis. Prior to joining the PhD program, she received her Masters degree in Information Security. Her research interests include, but are not limited to, applied cryptography, database security, privacy, and privacy-aware provenance.
The goal of this project is to develop a feature-rich provenance management architecture, which we call PBase, that integrates with the core DataONE architecture. To achieve this, we will combine two strands of work that the Provenance WG has been pursuing for the past two years. The first, Golden-Trail: A Provenance Repository For Storing And Retrieving Data Lineage Information (2010) , focused on the realization of a common provenance model (D-PROV), a provenance repository, and an interactive user interface (Golden-Trail). The second effort (2012) has been centered on using the member nodes’ Data Packaging features in combination with provenance-aware workflow execution.
The intern will develop a prototype of PBase by building upon this prior work. The prototype will demonstrate the benefits of an architectural stack that includes advanced query and analytics capabilities over a corpus of provenance traces, which are associated with data stored in Data Packages within member nodes. It will also enable the composition of provenance fragments produced separately by workflows that are independent and yet share some of their data, a natural occurrence in e-science . At the same time, we will retain the advantages of using provenance terms for data discovery, which we have demonstrated in our most recent prototype, as well as the storage of workflows, their data, and the provenance into self-contained packages.
Workflows may come from different systems. Thus, we aim to show interoperability of the provenance traces collected from those systems, by means of our unified D-PROV provenance data model.