The 2015 DataONE Summer Internship Program is now CLOSED to applications
Thank you to all those that applied.
The Data Observation Network for Earth (DataONE) is a virtual organization dedicated to providing open, persistent, robust, and secure access to biodiversity and environmental data, supported by the U.S. National Science Foundation. DataONE is pleased to announce the availability of summer research internships for undergraduates, graduate students and recent postgraduates.
Interns undertake a 9 week program of work centered around one of the projects listed below. Each intern will be paired with one primary mentor and, in some cases, secondary and tertiary mentors. Interns need not necessarily be at the same location or institution as their mentor(s). Interns and mentors are expected to have a face-to-face meeting at the beginning of the summer, maintain frequent communication throughout the program and interns are required to work in an open notebook environment.
February 13 - Application period opens
March 16 - Deadline for receipt of applications at midnight Mountain time
Apr 1 - Notification of acceptance and scheduling of face-to-face meetings (schedules permitting)
May 25 - Program begins*
June 22 - Midterm evaluations
July 24 - Program concludes**
* Some allowance will be made for students who are unavailable during these dates due to their school calendar.
** Program may not extend beyond Aug 14 2015.
The program is open to undergraduate students, graduate students, and postgraduates who have received their degree within the past five years. Given the broad range of projects, there are no restrictions on academic backgrounds or field of study. Interns must be at least 18 years of age by the program start date, must be currently enrolled or employed at a U.S. university or other research institution and must currently reside in, and be eligible to work in, the United States. Interns are expected to be available approximately 40 hours/week during the internship period (noted above) with significant availability during the normal business hours. Interns from previous years are eligible to participate.
Interns will receive a stipend of $5,000 for participation, paid in two installments (one at the midterm and one at the conclusion of the program). In addition, required travel expenses will be borne by DataONE. Participation in the program after the mid-term is contingent on satisfactory performance. The University of New Mexico will administer funds. Interns will need to supply their own computing equipment and internet connection. For students who are not US citizens or permanent residents, complete visa information will be required, and it may be necessary for the funds to be paid through the student’s university or research institution. In such cases, the student will need to provide the necessary contact information for their organization.
Projects cover a range of topic areas and vary in the extent and type of prior background required of the intern. Not all projects are guaranteed funding and the interests and expertise of the applicants will, in part, determine which projects will be selected for the program. The titles and descriptions of this year’s projects are posted below.
2015 Project Titles
1) Evaluating the impact of data access: The role of metrics
2) Making a Robust and Useful Earth Science Ontology Repository: Creating a test suite, automating ontology uploading, and standardizing RESTful API
3) Front end web site for the Earth Science Ontology Repository
4) Catalog Services for the Web (CSW) Adaptor for The Generic Member Node
5) A Tool for Live, Interactive Workflow Views over Programming Scripts
6) Network and social media communication analysis of the DataONE user community
Project 1: Evaluating the impact of data access: The role of metrics
Primary Mentor: Suzie Allard
Secondary Mentor: Mike Frame
Additional Mentors: Carol Tenopir, Amber Budden
Necessary Prerequisites: The student should be a graduate student in a science, including information science, program or a related discipline.
Desirable Skills / Qualifications: Familiarity with web analytics, information seeking/usage, understanding of performance metrics, and database applications analysis principles. A background in Information Science is desired. Knowledge of STEM is preferred, but must at minimum, possess an interest in STEM.
Expected Outcomes: The student will help identify current and suggest future metrics that can be used for determining the role of DataONE in measuring interconnections made and the potential for new types of science that may be only now emerging and largely unrealized.
DataONE has a unique mission - it creates a foundation for supporting existing science and for enabling new science. Normal evaluation metrics provide a means for measuring this foundation. However, to our knowledge, there are no agreed upon metrics for identifying how the access to data has created new interconnections or for predicting the potential for new types of science that may only now be emerging and as of yet have potential that is unrealized. Nevertheless, embarking upon the development of both quantitative and qualitative indicators can and should lead to useful metrics that will prove of value to DataONE, other DataNets, and the broader community.
The student will conduct a comprehensive literature review of how impact is currently defined and evaluated, followed by an environmental scan to determine what metrics currently exist for measuring the interconnections leading to new science by data infrastructures. A comprehensive review of related projects, their associated metrics, and outcomes is critical in identifying these potential new metrics. Based on these scans and in consultation with the mentors, the student will then suggest potential new metrics and their associated methodologies that could be adopted by DataONE.
Project 2: Making a Robust and Useful Earth Science Ontology Repository: Creating a test suite, automating ontology uploading, and standardizing RESTful API
Primary Mentor: Deborah L. McGuinness
Secondary Mentor: Xixi Luo
Necessary Prerequisites: Computer Science major
Desirable Skills / Qualifications: Demonstrated experience in software testing and scripting languages. Programming skills, ability to work independently and meet deadlines. Knowledge of Semantic Web technologies is desired.
*Browseable report of the unit testing results.
*Script or program that will automate the ontology uploading process without interrupting the Earth Science Ontology Repository’s service.
*Easily understood documentation of the Earth Science Ontology Repository’s RESTful API for developers. The format should be as described here: https://purl.dataone.org/architecture-dev/apis/MN_APIs.html
The Earth Science Ontology Repository (ESOR) portal contains many vocabularies that are important and useful for sharing earth science data. It serves a similar function as BioPortal that contains a collection of ontologies that support biology, health, and life sciences research, but with a focus on the Earth Science domain. DataONE will benefit from a collection of ontologies with well defined terms that are used in earth science data so that earth science data may be integrated in a correct and consistent manner and also so that search services may be enhanced. Search over the Earth Science Ontology Repository is “smart” in that Its implementation is not based only on keyword search; semantic techniques are also involved so that the search functionality can actually “understand” the meaning of terms. ESOR can be used as the backend knowledge base for multiple applications -- for example, semi-automatic or automatic entity matching.
In order to turn the Earth Science Ontology Repository into a product, we need to create unit tests - a separate stand alone testing capability using JUnit, so that we can be confident it can handle the different use cases for different situations. This test suite will allow automatic testing of updates so that the repository can grow with minimal human effort and a level of consistency can be guaranteed.
In order for this repository to be sustainable, we also need to have simple and automatic (or at least semi-automatic) methods for enhancing the content. Right now, to deploy a new ontology to the Earth Science Ontology repository, 14 manual steps are involved (see the details of the 14 steps here). In order to speed up the process, it is necessary to explore possibilities for automation. In this project, robust automatic upload processes will be designed, tested, and deployed.
Meanwhile, for the Earth Science Ontology to be more broadly reused, it needs to conform to the principles of RESTful services, and its API needs to have easily understandable documentation for developers. The summer intern will work with the postdoctoral fellow who has created the ontology and a professor who is a leading expert in ontology environments to complete the project. In addition to creating a much more usable repository with these services, we expect to write a paper for publication. The student is welcome to co-author the publication.
Project 3: Front end web site for the Earth Science Ontology Repository
Primary Mentor: Deborah L. McGuinness
Secondary Mentor: Xixi Luo
Necessary Prerequisites: Computer Science major
Desirable Skills / Qualifications: Demonstrated experience in website design. Programming skills, ability to work independently and meet deadlines. Knowledge of Semantic Web technologies is desired.
Expected Outcomes: A user-friendly front-end web site that can handle various tasks, such as ontology uploading, semantic search, and annotation.
The Earth Science Ontology Repository (ESOR) portal contains many vocabularies that are important for sharing earth science data. It serves a similar function as BioPortal, but with more of a focus on the Earth Science domain. DataONE will benefit from a collection of ontologies with well defined terms that are used in earth science data so that earth science data may be integrated in a correct and consistent manner and also so that search services may be enhanced. Search over the Earth Science Ontology Repository is “smart” in that Its implementation is not based only on keyword search; semantic techniques are also involved so that the search functionality can actually “understand” the meaning of terms. ESOR can be used as the backend knowledge base for multiple applications -- for example, semi-automatic or automatic entity matching.
In order to turn the Earth Science Ontology Repository into a product, we need to have a user friendly front end web site that can handle various tasks, such as ontology uploading, semantic search and annotation. The summer intern will work with the postdoctoral fellow who has created the ontology and a professor who is a leading expert in ontology environments to complete the project. In addition to creating a much more usable web front end service, we expect to write a paper for publication. The student is welcome to co-author the publication.
Project 4: Catalog Services for the Web (CSW) Adaptor for The Generic Member Node
Primary Mentor: Mark Servilla
Secondary Mentor: Mark Flynn
Additional Mentors: Laura Moyers, Dave Vieglais, Bruce Wilson
Necessary Prerequisites: Good background in the Python programming languages
Desirable Skills / Qualifications: Familiarity with Open Geospatial Consortium Catalog Services for the Web standard. Experience with programming python for web services. Experience processing XML with python.
Expected Outcomes: An adaptor the enables the DataONE Generic Member Node software to connect with an OGC CSW service.
The Open Geospatial Consortium Catalog Services for the Web (http://en.wikipedia.org/wiki/Catalog_Service_for_the_Web) standard provides a convenient mechanism to programmatically access data from data repositories. The CSW service aligns conceptually, though not programmatically with the DataONE REST services that are already implemented by Member Nodes (https://purl.dataone.org/architecture/apis/MN_APIs.html) to enable participation in the DataONE federation.
The goal of this project is to develop an adaptor that enables the DataONE Generic Member Node software stack (http://pythonhosted.org/dataone.generic_member_node/) to act as a proxy to one or more CSW services, and so provide a convenient mechanism for CSW compliant services to participate in the DataONE federation with minimal effort.
Project 5: A Tool for Live, Interactive Workflow Views over Programming Scripts
Primary Mentor: Bertram Ludäscher
Secondary Mentor: Paolo Missier
Additional Mentors: Timothy McPhillips
Necessary Prerequisites: Experience in Python and/or Java Programming
Expected Outcomes: Interactive YW graph tool; software checked into https://github.com/yesworkflow-org.
YesWorkflow (YW) is software toolkit that provides some of the benefits of using a scientific workflow system without having to rewrite scripts and other scientific software. Rather than reimplementing code so that it can be executed and managed by a workflow engine, a YW user simply adds special comments to existing scripts to declare how data is used and results are produced, step by step, by the script. YW uses these comments to create a rendering of the script as a workflow. A YW graphing module currently produces static graphical views (in Graphviz DOT format) of the resulting workflow model of the script.
The static graphs produced by YW can be large and complex. We propose to develop an interactive viewer for YW graphical output that will make these graphs easier to explore and interpret. For example, clicking on a data item in the workflow view optionally will highlight the (prospective) direct and indirect data dependencies for that data item (the data from which it will be derived when the script is run). Features for expanding and collapsing nested subworkflows also will facilitate exploration of these graphs.
The interactive graph tool will also serve as an entry point to discovering and exploring the original scripts and relating them to the workflow graphs, e.g., clicking on a node in the YW view will allow the user to inspect the source code behind that node. Similarly, users of data products can view both the YW representation of the script, and the original data manipulation code corresponding to blocks in the workflow graphs.
Last not least, the interactive graph will facilitate use of YesWorkflow as a design tool when developing new scripts (or even before a script is written) via live-update features. Given a set of script files, the live-graph feature will monitor these files for changes and update the chosen graphical view automatically. Users of this feature will continue to be able use their favorite text editor or IDE for developing their scripts.
Project 6: Network and social media communication analysis of the DataONE user community
Primary Mentor: Amber Budden
Secondary Mentor: Patricia Cruse, Yiwei Wang
Additional Mentors: Asa Scott
Necessary Prerequisites: Familiarity with social media, good database / data management skills. Interest in social network analysis, market research and business.
Desirable Skills / Qualifications: Experience in social network analysis, experience with metric analysis, data visualization. Programming skills desirable
Expected Outcomes: The proposed project will have two outcomes:
1) Using information from across all DataONE communication channels, this project will develop a comprehensive understanding of DataONE’s community of users and their use of social networking applications.
2) Using the gathered data the project will create a network visualization depicting the clustering of stakeholder groups and the linkages between them, enabling a more critical assessment of the current and potential reach of DataONE.
The intern will work with the Director of Community Engagement and Outreach and the co-lead of the DataONE Sustainability and Governance Working Group to critically evaluate the current engaged DataONE community. Information from mailing lists, social media accounts, webinar participation and website activity will be used to identify the outer range of the DataONE community. Calibration of user profiles across accounts will enable a more parsimonious estimate of the community and the intern will then work with these data to construct a network visualization, breaking down clusters by stakeholder group and mode of engagement with DataONE. These data will help inform the project with respect to target markets, future product development and resource allocation.
Applications must be completed by 11:59 PM (Mountain time) on March 16th. You will be asked to upload a cover letter and resume, both in PDF format. Applicants should also provide a letter of reference. The letter of reference should be sent directly by its author to email@example.com by the application deadline.
- The cover letter should address the following questions:
- Which DataONE Summer Internship project(s) are you most interested in and why?
- What contributions do you expect to be able to make to the project(s)?
- What background do you have which is relevant to the project(s)?
- What do you expect to learn and/or achieve by participating?
- What are your thoughts and ideas about the project, including particular suggestions for ways of achieving the project objectives?
- How will participation in this program help you achieve your educational and career objectives?
- Are there any factors that would affect your ability to participate, including other summer employment, university schedules, and other commitments?
- The resume should include the applicant’s educational history, current position, any publications or honors, and full contact information (including phone number, e-mail address, and mailing address).
- The letter of reference should be sent directly to firstname.lastname@example.org and should be from a professor, supervisor, or mentor.
Evaluation of applications
Applications will be judged by the following criteria:
- The academic and technical qualifications of the applicant.
- Evidence of strong written and oral communication skills.
- The extent to which the applicant can provide substantive contributions to one or more projects, including the applicant’s ideas for project implementation.
- The extent to which the internship would be of value to the career development of the applicant
- The availability of the applicant during the period of the internship.
DataONE is predicated on openness and universal access. Software is developed under one of several open source licenses, and copyrightable content produced during the course of the project will made available under a Creative Commons (CC-BY 3.0) license. Where appropriate, projects may result in published articles and conference presentations, on which the intern is expected to make a substantive contribution, and receive credit for that contribution.
Previous Summer Internships are supported by a National Science Foundation Award (NSF Award 0830944): "DataNetONE (Observation Network for Earth)". Current Summer Internships are supported by National Science Foundation Award #1430508.
For more information
If you have questions or problems about the application process or internship program in general, please send e-mail to email@example.com.