I want to search

MENU

Provide a citation and document provenance for your dataset

Best Practice: 

For appropriate attribution and provenance of a dataset, the following information should be included in the data documentation or the companion metadata file:

  • Name the people responsible for the dataset throughout the lifetime of the dataset, including for each person:
    • Name
    • Contact information
    • Role (e.g., principal investigator, technician, data manager)

According to the International Polar Year Data and Information Service, an author is the individual(s) whose intellectual work, such as a particular field experiment or algorithm, led to the creation of the dataset. People responsible for the data can include: individuals, groups, compilers or editors.

  • Description of the context of the dataset with respect to a larger project or study (include links and related documentation), if applicable.
  • Revision history, including additions of new data and error corrections.
  • Links to source data, if the data in one dataset were derived from data in another dataset.
  • List of project support (e.g., funding agencies, collaborators, material support).
  • Describe how to properly cite the dataset. The data citation should include:
    • All contributors
    • date of dataset publication
    • Title of dataset
    • media or URL
    • Data publisher
    • Identifier (Digital Object Identifier)
  • Description Rationale: 

    Documenting the dataset origin, history, and contact information allows for proper citation of datasets. By encouraging the proper citation of datasets, data providers and publishers receive appropriate credit for their efforts.

    Additional Information: 

    The Oak Ridge National Laboratory Distributed Active Archive Center has guidance and rational for citing data sets:
    Editorial: Citations to Published Data Sets.

    Buneman P, Khanna S, Tan W. 2001. Why and Where: A Characterization of Data Provenance. Pp. 316-330 in Lecture Notes in Computer Science. Springer Berlin/Heidelberg.
    Osterweil LJ, Clarke LA, Ellison AM, Boose E, Podorozhny R, Wise A. 2010. Clear and precise specification of ecological data management processes and dataset provenance. IEEE Transations on Automation Science and Engineering 7(1):189-195.
    Simmhan YL, Plale B, Gannon D. 2005. A survey of data provenance in e-science. ACM SIGMOD 34(3):31-36.

    Examples: 

    Turner, D.P., W.D.Ritts, and M. Gregory. 2006. BigFoot NPP Surfaces for North and South American Sites, 2002-2004. Data set. Available on-line (http://daac.ornl.gov) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/750.