News Reports from the DataONE Director’s Chair
“The first year of DataONE” News Report No. 2 (23 August 2010) – Bill Michener, PI
August 2010 marks the end of the first year of DataONE. This milestone is a good time to reflect on where we have been and where we are heading. There is much to report with respect to both the implementation of the cyberinfrastructure and community engagement via education and outreach.
Progress on the prototype implementation has proceeded as specified in the Project Management Plan, with three Member Nodes, three Coordinating Nodes, and core elements of the Investigator Toolkit completed for the project year one milestone.
- The three Coordinating Nodes are deployed and operational (as non-public prototypes) at Oak Ridge Campus, The University of New Mexico, and the University of California, Santa Barbara.
- Three Member Nodes selected for their broad representation of content anticipated for the project have been deployed: one each for Dryad (at Duke’s NESCent), the ORNL Distributed Active Archive Center, and the Knowledge Network for Biocomplexity (UC, Santa Barbara). These Member Nodes each represent very different architectures and so provide excellent examples of challenges that may arise during production deployment.
- The Investigator Toolkit currently has client libraries implemented in two widely used languages (Java and Python) and includes command line clients as well as a client for the R statistical and analysis package, and a proof of concept “DataONE drive” which will enable users to mount the entire DataONE cloud infrastructure as a file system.
- The most significant element not completed in the prototype is the implementation of a federated identity framework, although good progress is being made towards selection of a suitable implementation.
Over the next two months, Dave Vieglais and the team of developers will complete a detailed evaluation of the prototype infrastructure to include design and requirements review, infrastructure performance and scalability testing, and overall stress and failure recovery capabilities. In essence, they will try to break what they have built. This testing period will lead to the documentation and prioritization of suggested changes to cyberinfrastructure architecture and components that will be implemented for the initial public release of the DataONE infrastructure.
Community engagement is broadly defined to include an array of activities associated with building the DataONE organization; engaging the scientific community and other stakeholders in developing and using DataONE; and enabling new science. Besides establishing and staffing the headquarters and Coordinating Nodes, a detailed Project Management Plan was developed, and a routine reporting structure (weekly calls, quarterly and annual reports) was implemented. Other key accomplishments include:
- The 12-member External Advisory Board (chaired by Dr. Berrien Moore, University of Oklahoma; vice-chaired by Dr. Liz Lyon, UK Digital Curation Centre) has been established and held its first official meeting in July 2010. The Board includes stellar representation from academia, business, and government. We look forward to their continued and invaluable guidance as we move forward. The Board membership consists of:
- Rosio Alvarez, Lawrence Berkeley National Laboratory
- Adrian Burton, Australian National Data Service
- Nancy Grimm, Arizona State University
- Kevin Guthrie, ITHAKA
- Tony Hey, Microsoft Research
- Rick Luce, Emory University
- Cliff Lynch, Coalition for Networked Information
- Liz Lyon, Digital Curation Centre, United Kingdom
- Martha Maiden, NASA-HQ
- Berrien Moore, University of Oklahoma
- Paul Risser, University of Oklahoma
- Brian Schottlaender, UC-San Diego
- DataONE has held an All Hands Meeting and initiated three planned Working Groups (Sustainability & Governance which met 2X; Socio-cultural which met 1X; and Usability and Assessment which met 2X) as well as one new Working Group (Exploration, Visualization and Analysis; EVA, for short) that was created so that scientists could drive the cyberinfrastructure design and implementation.
- The Evaluation and Assessment Working Group, led by Carol Tenopir (University of Tennessee) and Mike Frame (USGS), completed a baseline assessment of scientists and will soon report its findings.
- Eight students successfully completed a summer research experience with mentors from DataONE—more on this exciting program in future news reports.
- Our first societal training workshop was organized by Robert Cook (Oak Ridge National Laboratory) and held early August at the Ecological Society of America’s annual meeting in Pittsburgh.
Topics and instructors included:
- Fundamental Practices for Preparing Data Sets (Bob Cook, ORNL)
- Collaboration and Data Sharing (Stefanie Hampton, NCEAS)
- Organizing data sets: Participants are encouraged to bring their own (Workshop Team)
- Metadata (Viv Hutchison, USGS)
- Elements of a Data Management Plan (Bill Michener, University of New Mexico)
- Planning has begun for the first DataONE Users Group meeting that will be held in December 2010. This first meeting will be a design meeting and will include two-dozen or so invitees who will establish this new organization.
- A prototype web portal that provides access to a best practices database and a tools database has been completed and is being finalized for official release in September.
- The DataONE All Hands Meeting is set for November 2-4 and an agenda will be forthcoming to all Working Group members.
- The DataNet Program will undergo an 18-month NSF review in February 2011. We look forward to discussing our past accomplishments and future plans at that time, as well as receiving constructive feedback from the review team.
Other News and Tidbits.
A DataONE Success Story. The EVA Working Group, led by Steve Kelling (Cornell Lab of Ornithology) and Bob Cook (ORNL), has been phenomenally productive over its short period of existence. First, the group worked with the Cornell Lab of Ornithology to develop the eBird Gulf Spill Bird Tracker (see http://ebird.org/tools/oilspill/). This Cornell Lab or Ornithology web site has been instrumental in forecasting oil spill impacts and in tracking the impacts on avifauna. This database exemplifies the socially relevant science that DataONE is being designed to enable. Second, the EVA Working Group project that focuses on understanding the dynamics of continental-scale bird migration has achieved notable success. The work was most recently featured in Nature (https://doi.org/10.1038/news.2010.395) and highlighted at the August TeraGrid meeting in a standing room only presentation by Daniel Fink (Cornell Lab of Ornithology and EVA member). Moreover, this case study served as the impetus for an award of 100,000 hours of TeraGrid computing resources that will provide the foundation for the analyses that will underpin the 2011 State of the Birds. Congratulations to the entire EVA Working Group and their partners!
The academic year has started once again—what an incredibly short summer!
Best wishes for the start of fall.