Sharing reproducible research through DataONE and Whole Tale

Seokki Lee

Seokki Lee is a fourth year PhD student in the database group at Illinois Institute of Technology. His research focus is on data provenance and missing answers, specifically, unifying provenance and missing answers with providing approximate summaries for queries with negation to support in one system. His background includes a master's degree in Computer Science and Engineering as well as Engineering Management in Hanyang University and California State University at Northridge, respectively. He holds a bachelor of Computer Engineering from Dankook University. Outside of work and research, he enjoys playing tennis and spending time with his family.

Project Description: 

Reproducible research is the essence of empirical science, and yet common practices fall short of producing results that are fully transparent and reproducible. This summer internship will focus on building a collaboration between working groups conducting ecological synthesis at NCEAS and the intern who wants to enable the results of these computational syntheses to be stored in a fully reproducible and transparent manner using provenance tools and standards from DataONE. The intern will work with researchers to understand and conduct computational analysis in a reproducible manner, and then use the Whole Tale system to document and archive “tales” in DataONE. In Whole Tale, a tale represents a set of scientific results, such as modeling output, figures, tables, and derived data, along with the documentation needed to understand those outputs and their linkages to the computational processes that generated them. This provenance information includes a full manifest of the data inputs, the computational code and processes used, the outputs, and the execution environment from which the results were generated. The Whole Tale system can be used to generate this package of reproducible research results. We would expect the intern to identify one or two extant working groups at NCEAS that are conducting synthesis, and to help those groups produce fully-reproducible tales describing their results, and publish these with a DOI in DataONE.

Primary Mentor: 
Bertram Ludaescher