I want to search



The application period for the 2018 DataONE Summer Internship Program has now CLOSED

The 2018 DataONE Summer Internship Program

The Data Observation Network for Earth (DataONE) is a virtual organization dedicated to providing open, persistent, robust, and secure access to biodiversity and environmental data, supported by the U.S. National Science Foundation. DataONE is pleased to announce the availability of summer research internships for undergraduates, graduate students and recent postgraduates.

Program Information

Interns undertake a 9-week program of work centered around one of the projects listed below. Each intern will be paired with one primary mentor and, in some cases, secondary and tertiary mentors. Interns need not necessarily be at the same location or institution as their mentor(s).


February 19 - Application period opens
March 23 March 26 - Deadline for receipt of applications at midnight Mountain time
Apr 9 - Notification of acceptance and scheduling of face-to-face meetings (schedules permitting)
May 21 - Program begins*
June 19 - Midterm evaluations
July 20 - Program concludes**
* Some allowance will be made for students who are unavailable during these dates due to their school calendar.
** Program may not extend beyond Aug 10, 2018.


The program is open to undergraduate students, graduate students, and postgraduates who have received their degree within the past five years. Given the broad range of projects, there are no restrictions on academic backgrounds or field of study. Interns must be at least 18 years of age by the program start date, must be currently enrolled or employed at a U.S. university or other research institution and must currently reside in, and be eligible to work in, the United States. Interns are expected to be available approximately 40 hours/week during the internship period (noted above) with significant availability during the normal business hours. Interns from previous years are eligible to participate.

Financial Support

Interns will receive a stipend of $5,000 for participation, paid in two installments (one at the midterm and one at the conclusion of the program). In addition, required travel expenses will be borne by DataONE. Participation in the program after the mid-term is contingent on satisfactory performance. The University of New Mexico will administer funds. Interns will need to supply their own computing equipment and internet connection. For students who are not US citizens or permanent residents, complete visa information will be required, and it may be necessary for the funds to be paid through the student’s university or research institution. In such cases, the student will need to provide the necessary contact information for their organization.


Projects will cover a range of topic areas and vary in the extent and type of prior background required of the intern. Not all projects are guaranteed funding and the interests and expertise of the applicants will, in part, determine which projects will be selected for the program.

2018 Project Titles

Project 1: Sharing reproducible research through DataONE and Whole Tale
Project 2: Supporting Synthesis Science with DataONE
Project 3: Communications and Outreach: Development of a Primer for Early-Career Researchers
Project 4: Extending Libmagic for Identification of Science Resources

Project Details

Project 1: Sharing reproducible research through DataONE and Whole Tale

Primary Mentor(s): Bertram Ludäscher
Secondary Mentor(s): Matt Jones

Necessary Prerequisites:

  • Interest and enthusiasm for open science and reproducible research
  • Background in scientific analysis and computing using R, Python, or Matlab

Desirable Skills / Qualifications:

  • Background in ecological or environmental science.

Expected Outcomes:
Reproducible research narratives (“tales”) in the Whole Tale repository ...

  • linked to input and output datasets in DataONE
  • with provenance linkages fully describing the computational processes that produced those outputs
  • and specifications for the computational environment used
  • and all published as a citable research package in DataONE

Project Description:
Reproducible research is the essence of empirical science, and yet common practices fall short of producing results that are fully transparent and reproducible. This summer internship will focus on building a collaboration between working groups conducting ecological synthesis at NCEAS and the intern who wants to enable the results of these computational syntheses to be stored in a fully reproducible and transparent manner using provenance tools and standards from DataONE. The intern will work with researchers to understand and conduct computational analysis in a reproducible manner, and then use the Whole Tale system to document and archive “tales” in DataONE. In Whole Tale, a tale represents a set of scientific results, such as modeling output, figures, tables, and derived data, along with the documentation needed to understand those outputs and their linkages to the computational processes that generated them. This provenance information includes a full manifest of the data inputs, the computational code and processes used, the outputs, and the execution environment from which the results were generated. The Whole Tale system can be used to generate this package of reproducible research results. We would expect the intern to identify one or two extant working groups at NCEAS that are conducting synthesis, and to help those groups produce fully-reproducible tales describing their results, and publish these with a DOI in DataONE.

Project 2: Supporting Synthesis Science with DataONE

Primary Mentor(s): Megan Mach
Secondary Mentor(s): Amber Budden

Necessary Prerequisites:
Good data management skills, experience with reference management software, familiarity with synthesis research, institutional library access to publications.

Desirable Skills / Qualifications:
Experience with systematics reviews, background in Earth / environmental science, experience with R statistical package / R-Studio.

Expected Outcomes:
There are two primary outcomes from the project:

  • A database of published Earth and environmental synthesis papers with associated variables including author, bibliometrics (citations etc), data objects used, host repository of data objects, presence in DataONE etc and comprehensive documentation on the process involved in development of the database.
  • A draft publication exploring the value of DataONE to synthesis research through analysis of the database in relation to current DataONE holdings.

Project Description:
About DataONE
DataONE supports synthesis research through enhanced search and discovery of Earth and environmental science data from across a network of integrated data repositories. Efficiencies to researchers can include reduced time in data discovery, refined search function resulting in more relevant data results and the ability to download data from multiple repositories among others. Researchers working in synthesis science, conducting systematic reviews or meta-analyses will benefit from using DataONE as a data search engine.
The problem
Obtaining metrics on the usage of DataONE for development of published research is challenging. Products of synthesis research appropriately cite the data objects and the repository in which they are hosted, but not the method through which they discovered the data.
The project
This project will conduct a systematic review of published Earth and environmental science material that are synthesis papers. From this set of papers, we will identify the cohort of data used, the data repository in which it is located and explore if those data are currently exposed within DataONE. This will enable us to demonstrate the percentage of synthesis papers that could have been completed using data currently found in DataONE and also identify additional data repositories that DataONE might seek to include as part of the network.

Project 3: Communications and Outreach: Development of a primer for early-career researchers

Primary Mentor(s): Megan Mach
Secondary Mentor(s): Amber Budden

Necessary Prerequisites:

  • Degree in the life-sciences or in communications / marketing
  • Familiarity with research data management and data science practices
  • Familiarity with marketing and communications
  • Good project management skills

Desirable Skills / Qualifications:

  • Experience in print production; layout and design
  • Experience with graphic design
  • Access to the Adobe suite of design software

Expected Outcomes:

  • Primer for early-career researchers
  • Communication plan for sharing the primer,
  • Initial foray into sharing the primer

Project Description:
About DataONE
Data are diverse--spanning time, space, and disciplines--and are being generated at unprecedented rates. These large volumes of data are required to answer the most pressing ecological challenges, yet such globally distributed data can be difficult for researchers, educators, and others to discover, access, and integrate. DataONE addresses these challenges by providing a single search interface that allows discovery of content from an ever-growing collection of data repositories. You no longer need to search across multiple platforms at multiple sites. Using DataONE simplifies the process by saving time and effort, helping you achieve greater efficiency and productivity. Along with the search tool, DataONE has high quality resources for training in data management, including teaching materials, webinars and a database of best-practices to improve methods for data sharing and management.
The project
Accessing and managing data can be an early researcher’s worst nightmare. During this internship with DataONE you will create a Primer for early-career Earth and environmental researchers guiding them on how DataONE can help them with this process. The primer will connect researchers to the search capabilities of search.dataone.org and walk them through the process of managing their own data using DataONE resources. Once the primer is created the intern will develop and initiate implementation of a communication plan sharing this resource with various academic institutions. The primer will be designed to have multiple-facets, connecting researchers both in print-media and digital PDF to online resources. We are excited to offer this opportunity to someone within media/communication/marketing or a related field to work with the DataONE team and our network of Earth and environmental data repositories.

Project 4: Extending Libmagic for Identification of Science Resources

Primary Mentor(s): Dave Vieglais
Secondary Mentor(s): Chris Jones

Necessary Prerequisites:

  • Familiarity with regular expressions
  • XML syntax, validation, parsing
  • Competence with Linux command line tools
  • Programming skills in Python and/or Java

Desirable Skills / Qualifications:

  • Familiarity with different science file formats
  • Familiarity with the linux "file" command and/or JHOVE

Expected Outcomes:

  • Additional "magic" files for libmagic to identify different science data and metadata file formats
  • Simple REST service implementation to facilitate identification of file types from light-weight clients such as javascript applications in browsers
  • Documentation on procedures for adding rules for additional file formats

Project Description:
Reliable determination of file formats is necessary to help ensure appropriate processing can be applied to the file. This is especially important when files are intended to be reused in the future since any knowledge of the producing system may be lost. There are many subtle variations in file formats that have significant implications for consumers. For example, many metadata standards are serialized as XML (text/xml or application/xml media type), but more detail is required for actual processing of the metadata. This information is usually available through a combination of the namespace(s) and schema(s) referenced by the XML. Manual interpretation of this information is relatively straightforward though is error prone due to subtle differences that may be present.
The goal of this project is to extend the capabilities of the Linux (or equivalents on OS X and Windows) file command to allow automatic identification of common science metadata and data formats. Two main activities are anticipated to achieve this goal. 1) Supporting additional file formats by extending or adding to the existing "magic" configuration files used by the file command. These magic files contain rules that enable identification of files by matching patterns within the file. 2) Provision of a simple REST service that accepts a file (or a portion thereof) and returns a JSON encoded response containing the identification of the file as provided by the file command.
- https://github.com/threatstack/libmagic
- http://jhove.openpreservation.org/
- https://linux.die.net/man/1/file

To Apply

Full details of the application process, and links to forms, will be available when the application period opens.

Evaluation of applications

Applications will be evaluation according to the following criteria:

  • The academic and technical qualifications of the applicant.
  • Evidence of strong written and oral communication skills.
  • The extent to which the applicant can provide substantive contributions to one or more projects, including the applicant’s ideas for project implementation.
  • The extent to which the internship would be of value to the career development of the applicant
  • The availability of the applicant during the period of the internship.

Intellectual Property

DataONE is predicated on openness and universal access. Software is developed under one of several open source licenses, and copyrightable content produced during the course of the project will made available under a Creative Commons (CC-BY 3.0) license. Where appropriate, projects may result in published articles and conference presentations, on which the intern is expected to make a substantive contribution, and receive credit for that contribution.

Funding acknowledgement

Previous Summer Internships are supported by a National Science Foundation Award (NSF Award 0830944): "DataNetONE (Observation Network for Earth)". Current Summer Internships are supported by National Science Foundation Award #1430508.

For more information

If you have questions or problems about the application process or internship program in general, please e-mail internship@dataone.org.