News Report No. 3 (25 March 2011) – Bill Michener, PI
Spring weather seems to be upon us. This past winter was an especially productive time for DataONE and there are many significant activities to report on including: cyberinfrastructure development, Working Groups, the new DataONE Users Group, and the DataONE External Advisory Board. Notably, DataONE underwent a critical 18-month review and passed with flying colors. The Review Panel had many good questions and suggestions for us to consider as we move forward as a project and as a member of the DataNet Federation of DataNet partners. Below, I summarize some of the recent progress and highlight some of the exciting activities on the short-term horizon.
The CI Team continues to meet its milestones and recently completed a detailed evaluation of the prototype infrastructure that included review of design and requirements, and testing of infrastructure performance, scalability, and overall stress and failure recovery capabilities. The initial DataONE cyberinfrastructure includes four core functionalities. These are:
1. Scheme Agnostic, Persistent Identifiers
2. Replication of Data and Metadata
3. Search and Discovery
4. Federated Identity and Access Control
The first three functionalities are complete. The Federated Security Working Group made significant progress in defining a federated identity framework. DataONE will use the CILogon extension to the InCommon federation identity provider network to enable both user and agent identification. Design aspects of user registration and access control vocabularies are in progress and will be a major focus of development during third quarter of year two of the project, prior to the public release of DataONE in late 2011.
The Investigator ToolKit now has client libraries implemented in two widely used computer languages (Java and Python), and includes command-line clients, a client for the R statistical package, and a proof-of-concept “DataONE drive” that enables mounting the DataONE cloud infrastructure as a file system.
Working Group / Workshop updates
There has been significant progress on the part of the Working Groups over the past three months. In particular, three new Working Groups / Workshops were established and they have already made several important contributions to DataONE.
First, the Semantics and Data Integration Working Group led by Line Pouchard, Oak Ridge National Laboratory, and Jeff Horsburgh, Utah State University, held their first meeting at Stanford University January 24-26, 2011. The primary goal of this Working Group is to provide a strategic roadmap, with prioritization, for semantic technologies and standards relevant to the DataONE mission. This roadmap will identify gaps and propose mitigation strategies relating to semantic technologies to enable DataONE to reach its overarching goals by the end of year 2.
Second, the Community Engagement and Education Working Group met for the first time at the All Hands Meeting in November. Their charter was subsequently approved and plans were made for year 2.
Third, a DataONE Preservation Workshop was held in Chicago, IL in conjunction with the International Digital Curation Conference on December 5-6, 2010. The outcome of this workshop was the DataONE preservation strategy, which is a simple three-tiered approach:
1. Keep the bits safe. Retaining the actual bits that comprise the data is paramount, as all other preservation and access questions are moot if the bits are lost. Key sub-strategies for this tier are (a) persistent identification, (b) replication of data and metadata, (c) periodic verification that stored content remains uncorrupted, and (d) reliance on Member Nodes to adhere to DataONE protocols and guidelines consistent with widely adopted public and private sector standards for IT infrastructure management.
2. Protect the form, meaning, and behavior of the bits. Users must also be able to make sense of the preserved bits into the future, so protecting their form, meaning, and behavior is critical. In this tier we rely on (a) collecting characterization metadata, (b) encouraging use of standardized formats, and (c) securing legal rights appropriate to long-term archival management, all of which support future access and format migration and emulation as needed to preserve meaning and behavior.
3. Safeguard the guardians. The DataONE network itself provides resiliency against the occasional loss of Member Nodes, and this will be shored up by succession planning, ongoing investigations into preservation cost models, and open-source tools that are sustainable by external developer communities.
DataONE Users Group
The inaugural meeting of the DataONE Users Group (DUG) occurred December 9-10, 2010 in Chicago, IL. The DUG is the worldwide community of Earth observation data authors, users, and stakeholders that includes representatives of Coordinating and Member Nodes, research networks, agencies, libraries, academia and the public and, as a result, reflects broad diversity in expertise. The DUG meets annually and serves to represent the needs and interests of these stakeholder communities in the activities of DataONE and to provide guidance that facilitates DataONE in achieving its vision and mission.
At the inaugural meeting, the interim Chair, Robert Sandusky, University of Illinois at Chicago,
and Vice-Chair, Richard Huffine, United States Geological Survey Library were elected. The DUG has 28 founding members representing 30 organizations. One of the first major accomplishments of the DUG was to develop and approve the “Service Guidelines for Member Nodes” that governs relationships between DataONE and its network of Member Nodes. The Service Guidelines were approved on January 14, 2011 and the DUG charter was approved on January 10, 2011. The next meeting is scheduled for July 2011 in Santa Fe, NM in conjunction with the ESIP summer meeting.
DataONE External Advisory Board
The DataONE External Advisory Board (EAB) met January 17-18, 2011 in Santa Fe. The EAB members were presented a number of reports and presentations responding to the recommendations from the July 2010 EAB meeting, and describing the current progress of the DataONE Project. There was also a very informative telephone conference with Alan Blatecky, NSF, that covered the NSF vision relating to research data and data-intensive science, future plans for the DataNet initiative, potential global partnerships to support research data management as well as current NSF thinking around data management plans and associated advocacy to the community and the role of DataONE.
Several major activities are scheduled for the spring and summer of 2011 including:
- The next DataONE Users Group meeting is scheduled for July 2011 in Santa Fe, NM in conjunction with the Federation of Earth Science Information Partners (ESIP) summer meeting.
- In May, a week-long community engagement and education workshop to enhance DataONEpedia and develop Data Management Planning resources will be held in Santa Fe;
- DataONE Summer Internship Program for graduate students is now seeking students for the summer;
- The first Walter E. Dean Environmental Information Management Institute for students will be held in May and June;
- Education workshop for developing teaching modules;
- Societal workshops on data management and data management planning at the Ecological Society of America annual meeting in Austin, Texas;
- Workshop to create a set of online educational resources.
- Baseline assessments of libraries and librarians. Complementary surveys to evaluate the needs and perspectives of these stakeholder groups
- Citizen Science and Outreach. Scheduled workshop focused on best practices for data management.
- Data Management Planning Online Tool: Developed in collaboration with the University of California Libraries, University of Virginia, University of Illinois, and the Digital Curation Center.
Because of the increasing number of DataONE activities, this newsletter will be shifting to a new format and a monthly schedule starting in May. This will help limit the size of the newsletter, provide information in a more timely fashion, and allow the newsletter author to occasionally focus in on a particular topic of interest.
I hope you have a most enjoyable spring.