Build Fundamental Components for Provenance-aware Model Exploration, Evaluation, and Benchmarking Cyber-infrastructure Prototype

Fei Du
Fei Du is a PhD student in Geographic Information Science (GIS) at the University of Wisconsin-Madison. He has a MS degree in Computer Sciences from UW-Madison and a MS degree in GIS from Chinese Academy of Sciences respectively. His research interests are in geospatial predictive modeling, trajectory data analysis, spatial data mining and geovisual analytics.

Project Description: 

Earth System Modeling is a primary approach to advance our understanding on the Earth’s biogeochemical cycles, including its interaction with human, and further to advance our understanding on climate change. There have been a variety of Earth system models developed with different approaches to address different components of the Earth’s biogeochemical cycle. Even though the findings of modeling efforts are promising, there are still many uncertainties associated with the results.
Model-data intercomparison is an important approach to diagnose and improve model processes and parameterizations by comparing differences between models and differences with model and observations. However, there are challenges, including 1) heterogeneous model output and observation data with different formats, spatial/temporal scales, etc.; 2) lack of tools that address the specific needs of data analysis and visualization for model-data intercomparison; 3) lack of mechanisms to reproduce and trace back to the origins of analyzed data and visualizations.
As an effort to tackle the above challenges, the DataONE EVA working group proposes to build a prototype of Provenance-aware Model Exploration, Evaluation, and Benchmarking Cyber-infrastructure on top of VisTrails and UV-CDAT, which are open source workflow-based scientific analysis and visualization frameworks, as described in Figure 1. This infrastructure has the capability to integrate distributed data resources from DataONE, Earth System Grid (ESG), or any user-provided model and observation data repositories through Brokers. The core component of the infrastructure contains libraries of standard modules and workflows for data analysis and visualization. Interfaces will be provided for different types of users and guide them to customize workflows for their specific model-data intercomparison needs. The infrastructure is linked together with provenance-aware tools so that VisTrails workflows can be converted to standard-based provenance representations and indexed through DataONE indexing mechanism. Provenance-based data discovery, customizations, and reproductions can then be achieved. The analyzed results, together with associated provenance information, can be packaged and contributed back to DataONE.

Primary Mentor: 
Bob Cook
Secondary Mentor: 
Yaxing Wei