For appropriate attribution and provenance of a dataset, the following information should be included in the data documentation or the companion metadata file:
- Name the people responsible for the dataset throughout the lifetime of the dataset, including for each person:
- Contact information
- Role (e.g., principal investigator, technician, data manager)
According to the International Polar Year Data and Information Service, an author is the individual(s) whose intellectual work, such as a particular field experiment or algorithm, led to the creation of the dataset. People responsible for the data can include: individuals, groups, compilers or editors.
- All contributors
- date of dataset publication
- Title of dataset
- media or URL
- Data publisher
- Identifier (Digital Object Identifier)
Documenting the dataset origin, history, and contact information allows for proper citation of datasets. By encouraging the proper citation of datasets, data providers and publishers receive appropriate credit for their efforts.
The Oak Ridge National Laboratory Distributed Active Archive Center has guidance and rational for citing data sets:
Editorial: Citations to Published Data Sets.
Buneman P, Khanna S, Tan W. 2001. Why and Where: A Characterization of Data Provenance. Pp. 316-330 in Lecture Notes in Computer Science. Springer Berlin/Heidelberg.
Osterweil LJ, Clarke LA, Ellison AM, Boose E, Podorozhny R, Wise A. 2010. Clear and precise specification of ecological data management processes and dataset provenance. IEEE Transations on Automation Science and Engineering 7(1):189-195.
Simmhan YL, Plale B, Gannon D. 2005. A survey of data provenance in e-science. ACM SIGMOD 34(3):31-36.
Turner, D.P., W.D.Ritts, and M. Gregory. 2006. BigFoot NPP Surfaces for North and South American Sites, 2002-2004. Data set. Available on-line (http://daac.ornl.gov) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/750.