I want to search

MENU

Document steps used in data processing

Best Practice: 

Different types of new data may be created in the course of a project, for instance visualizations, plots, statistical outputs, a new dataset created by integrating multiple datasets, etc. Whenever possible, document your workflow (the process used to clean, analyze and visualize data) noting what data products are created at each step. Depending on the nature of the project, this might be as a computer script, or it may be notes in a text file documenting the process you used (i.e. process metadata). If workflows are preserved along with data products, they can be executed and enable the data product to be reproduced.

Description Rationale: 

To enable others to verify the quality of a given data product, and ideally, to reproduce it, it is critical that the steps followed to create that product be properly documented.

Additional Information: 

This best practice is also applicable to other categories including Analysis and Visualization and Data Documentation.

  • Juliana Freire, Cláudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos Eduardo Scheidegger, Huy T. Vo: Managing Rapidly-Evolving Scientific Workflows. IPAW 2006: 10-18
  • Juliana Freire, David Koop, Emanuele Santos, Cláudio T. Silva: Provenance for Computational Tasks: A Survey. Computing in Science and Engineering 10(3): 11-21 (2008)