You are here
Querying Scientific Workflow Provenance
Mr. Song is a PhD student in Computer Science at University of California, Davis. He received his BS in Applied Biological Science and Enterprise Management at Zhejiang University. His research interests include graph theory and algorithm design, database query optimization, large data analysis and integration.
Scientific workflow systems are used to compose and automate complex computational pipelines from pre-existing software components. An important feature of scientific workflow systems is their ability to record provenance information. Provenance includes the processing history and lineage of data, and can be used, e.g., to validate data products, debug workflows, document authorship and attribution chains, etc., and thus facilitate “reproducible science”. The DataONE Working Group on Provenance (ProvWG) has developed a provenance model D-OPM (DataONE-OPM) for scientific workflows, based on the general purpose Open Provenance Model (OPM), and extended with workflow specific features. The goal of this year's summer project is to implement a special-purpose query language for provenance and workflow graphs, based on prior work by the mentors and state-of-the-art languages and techniques known from graph-based and declarative query languages. In particular, the system will allow the user to express a provenance query as a path expression or "graph pattern", which is then translated to a lower-level representation, which in turn is executed on an existing database engine. The resulting prototype will form a starting point for the DataONE cyberinfrastructure to support provenance analytics