TitleType of PublicationYear of PublicationAuthorsJournal TitleAbstractDOIIssuePaginationVolume
Data DiscoveryBook Chapter2018W.K. MichenerEcological Informatics. Data Management and Knowledge Discovery10.1007/978-3-319-59928-1115-128
Quality assurance and quality control (QA/QC)Book Chapter2018W.K. MichenerEcological Informatics. Data Management and Knowledge Discovery10.1007/978-3-319-59928-155-70
Creating and managing metadataBook Chapter2018W.K. MichenerEcological Informatics. Data Management and Knowledge Discovery10.1007/978-3-319-59928-171-88
Communicating and disseminating research findingsBook Chapter2018A.E. Budden; W.K. MichenerEcological Informatics. Data Management and Knowledge Discovery10.1007/978-3-319-59928-1289-317
Data integration: principles and practiceBook Chapter2018M. SchildhauerEcological Informatics. Data Management and Knowledge Discovery10.1007/978-3-319-59928-1129-157
A Science Products Inventory for Citizen-Science Planning and EvaluationJournal Article2018A. Wiggins; R. Bonney; G. LeBuhn; J.K. Parrish; J.F. WeltzinBioScience10.1093/bioscience/biy028biy028
Eleven quick tips for finding research dataJournal Article2018K. Gregory; S.J. Khalsa; W.K. Michener; F.E. Psomopoulos; A. de Waard; M. WuPLoS Comput Bio10.1371/journal.pcbi.1006038414
The Bari Manifesto: An interoperability framework for essential biodiversity variablesJournal Article2018A.R. Hardisty; W.K. Michener; D. Agosti; E.Alonso García; L. Bastin; L. Belbin; A. Bowser; P.Luigi Buttigieg; D.A.L. Canhos; W. Egloff; R. De Giovanni; R. Figueira; Q. Groom; R.P. Guralnick; D. Hobern; W. Hugo; D. Koureas; J. Liqiang; W. Los; J. Manuel; D. Manset; J. Poelen; H. Saarenmaa; D. Schigel; P.F. Uhlir; D. KisslingEcological Informatics

Essential Biodiversity Variables (EBV) are fundamental variables that can be used for assessing biodiversity change over time, for determining adherence to biodiversity policy, for monitoring progress towards sustainable development goals, and for tracking biodiversity responses to disturbances and management interventions. Data from observations or models that provide measured or estimated EBV values, which we refer to as EBV data products, can help to capture the above processes and trends and can serve as a coherent framework for documenting trends in biodiversity. Using primary biodiversity records and other raw data as sources to produce EBV data products depends on cooperation and interoperability among multiple stakeholders, including those collecting and mobilising data for EBVs and those producing, publishing and preserving EBV data products. Here, we encapsulate ten principles for the current best practice in EBV-focused biodiversity informatics as ‘The Bari Manifesto’, serving as implementation guidelines for data and research infrastructure providers to support the emerging EBV operational framework based on trans-national and cross-infrastructure scientific workflows. The principles provide guidance on how to contribute towards the production of EBV data products that are globally oriented, while remaining appropriate to the producer's own mission, vision and goals. These ten principles cover: data management planning; data structure; metadata; services; data quality; workflows; provenance; ontologies/vocabularies; data preservation; and accessibility. For each principle, desired outcomes and goals have been formulated. Some specific actions related to fulfilling the Bari Manifesto principles are highlighted in the context of each of four groups of organizations contributing to enabling data interoperability - data standards bodies, research data infrastructures, the pertinent research communities, and funders. The Bari Manifesto provides a roadmap enabling support for routine generation of EBV data products, and increases the likelihood of success for a global EBV framework.

Facilitating and Improving Environmental Research Data Repository InteroperabilityJournal Article2018C. Gries; A. Budden; C. Laney; M. O'Brien; M. Servilla; W. Sheldon; K. Vanderbilt; D. VieglaisData Science Journal10.5334/dsj-2018-02217
Research Data Sharing: Practices and Attitudes of GeophysicistsJournal Article2018C. Tenopir; L. Christian; S. Allard; J. BoryczEarth and Space Science

Abstract Open data policies have been introduced by governments, funders, and publishers over the past decade. Previous research showed a growing recognition by scientists of the benefits of data-sharing and reuse, but actual practices lag and are not always compliant with new regulations. The goal of this study is to investigate motives, attitudes, and data practices of the community of earth and planetary geophysicists, a discipline believed to have accepting attitudes towards data sharing and reuse. A better understanding of the attitudes and current data-sharing practices of this scientific community could enable funders, publishers, data managers, and librarians to design systems and services that help scientists understand and adhere to mandates and to create practices, tools, and services that are scientist-focused. An online survey was distributed to the members of the American Geophysical Union (AGU), producing 1372 responses from 116 countries. The attitudes of researchers to data sharing and reuse were generally positive, but in practice scientists had concerns about sharing their own research data. These concerns include the possibility of potential data misuse and the need for assurance of proper citation and acknowledgement. Training and assistance in good data management practices are lacking in many scientific fields and might help to alleviate these doubts.

Using Peer Review to Support Development of Community Resources for Research Data ManagementJournal Article2017H. Soyka; A. Budden; V. Hutchison; D. Bloom; J. Duckles; A. Hodge; M.S. Mayernik; T. Poisot; S. Rauch; G. Steinhart; L. Wasser; A.L. Whitmire; S. WrightJournal of eScience Librarianship https://doi.org/10.7191/jeslib.2017.111426
The influence of community recommendations on metadata completenessJournal Article2017S. Gordon; T. HabermannEcological Informatics

AbstractMany communities use standard, structured documentation that is machine-readable, i.e. metadata, to make discovery, access, use, and understanding of scientific datasets possible. Organizations and communities have also developed recommendations for metadata content that is required or suggested for their data developers and users. These recommendations are typically specific to metadata representations (dialects) used by the community. By considering the conceptual content of the recommendations, quantitative analysis and comparison of the completeness of multiple metadata dialects becomes possible. This is a study of completeness of EML and CSDGM metadata records from DataONE in terms of the LTER recommendation for Completeness. The goal of the study is to quantitatively measure completeness of metadata records and to determine if metadata developed by LTER is more complete with respect to the recommendation than other collections in EML and in CSDGM. We conclude that the LTER records are broadly more complete than the other EML collections, but similar in completeness to the CSDGM collections.

https://doi.org/10.1016/j.ecoinf.2017.09.005 -
Attitudes and norms affecting scientists’ data reuseJournal Article2017R.Gonçalves Curty; K. Crowston; A. Specht; B.W. Grant; E.D. DaltonPLOS ONE

The value of sharing scientific research data is widely appreciated, but factors that hinder or prompt the reuse of data remain poorly understood. Using the Theory of Reasoned Action, we test the relationship between the beliefs and attitudes of scientists towards data reuse, and their self-reported data reuse behaviour. To do so, we used existing responses to selected questions from a worldwide survey of scientists developed and administered by the DataONE Usability and Assessment Working Group (thus practicing data reuse ourselves). Results show that the perceived efficacy and efficiency of data reuse are strong predictors of reuse behaviour, and that the perceived importance of data reuse corresponds to greater reuse. Expressed lack of trust in existing data and perceived norms against data reuse were not found to be major impediments for reuse contrary to our expectations. We found that reported use of models and remotely-sensed data was associated with greater reuse. The results suggest that data reuse would be encouraged and normalized by demonstration of its value. We offer some theoretical and practical suggestions that could help to legitimize investment and policies in favor of data sharing.

DataONE: A Data Federation with Provenance SupportBook Chapter2016Y. Cao; C. Jones; V. Cuevas-Vicenttín; M.B. Jones; B. Ludäscher; T. McPhillips; P. Missier; C. Schwalm; P. Slaughter; D. Vieglais; L. Walker; Y. WeiProvenance and Annotation of Data and Processes: 6th International Provenance and Annotation Workshop, IPAW 2016, McLean, VA, USA, June 7-8, 2016, Proceedings230 - 234
Climate and Sustainability| Dominant Visual Frames in Climate Change News Stories: Implications for Formative Evaluation in Climate Change CampaignsJournal Article2016S. Rebich-Hespanha; R.E. RiceInternational Journal of Communication10
Understanding Scientific Data Sharing Outside of the AcademyConference Paper2016D. PollockProceedings of the 79th ASIS&T Annual Meeting: Creating Knowledge, Enhancing Lives Through Information & Technology
Research Data Services in European and North American Libraries: Current Offerings and Plans for the FutureConference Paper2016C. Tenopir; D. Pollock; S. Allard; D. HughesProceedings of the 79th ASIS&T Annual Meeting: Creating Knowledge, Enhancing Lives Through Information & Technology
Computational provenance: DataONE and implications for cultural heritage institutionsConference Paper2016R.J. Sandusky2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840984
Provenance Storage, Querying, and Visualization in PBaseBook Chapter2015V. Cuevas-Vicenttín; P. Kianmajd; B. Ludäscher; P. Missier; F. Chirigati; Y. Wei; D. Koop; S. DeyProvenance and Annotation of Data and Processes10.1007/978-3-319-16462-5239-241
Provenance-Based Searching and Ranking for Scientific WorkflowsBook Chapter2015V. Cuevas-Vicenttín; B. Ludäscher; P. MissierProvenance and Annotation of Data and Processes10.1007/978-3-319-16462-5_17209-214
Make Data Count - Unit 1 Final ReportJournal Article2015P.L.O.S. ALM; C. Strasser; J. Kratz; J. Lin
Perceived discontinuities and continuities in transdisciplinary scientific working groupsJournal Article2015K. Crowston; A. Specht; C. Hoover; K.M. Chudoba; M.Beth Watson-ManheimScience of The Total Environment10.1016/j.scitotenv.2015.04.121
Ecological data sharingJournal Article2015W.K. MichenerJournal of Ecological Informatics doi:10.1016/j.ecoinf.2015.06.01029
Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists WorldwideJournal Article2015C. Tenopir; E.D. Dalton; S. Allard; M. Frame; I. Pjesivac; B. Birch; D. Pollock; K. DorsettPLoS ONE 10.1371/journal.pone.0134826810
CitSci.org: A New Model for Managing, Documenting, and Sharing Citizen Science DataJournal Article2015Y. Wang; N. Kaplan; G. Newman; R. ScarpinoPLoS BiolPLoS Biol

<p>This Community Page proposes a platform to support effective metadata documentation with a view to improving the discoverability and reusability of citizen-gathered science data.</p>

10e1002280 - 13
“Personas” to Support Development of Cyberinfrastructure for Scientific Data Sharing.Journal Article2015K. CrowstonJournal of eScience Librarianship10.7191/jeslib.2015.108224
Ten Simple Rules for Creating a Good Data Management PlanJournal Article2015W.K. MichenerPLoS Comput Biol 10.1371/journal.pcbi.10045251011
Research Data Services in Academic Libraries: Data Intensive Roles for the Future?Journal Article2015C. Tenopir; D. Hughes; S. Allard; M. Frame; B. Birch; L. Baird; R. Sandusky; M. Langseth; A. LundeenJournal of eScience Librarianship10.7191/jeslib.2015.108524
YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from ScriptsJournal Article2015T. McPhillips; T. Song; T. Kolisnik; S. Aulenbach; K. Belhajjame; K. Bocinsky; Y. Cao; J. Cheney; F. Chirigati; S. Dey; J. Freire; C. Jones; J. Hanken; K.W. Kintigh; T.A. Kohler; D. Koop; J.A. Macklin; P. Missier; M. Schildhauer; C. Schwalm; Y. Wei; M. Bieda; B. LudäscherInternational Journal of Digital Curation298–31310
Correction: CitSci. org: A New Model for Managing, Documenting, and Sharing Citizen Science DataJournal Article2015Y. Wang; N. Kaplan; G. Newman; R. ScarpinoPLoS biology10.1371/journal.pbio.100234313
Making data countJournal Article2015J.E. Kratz; C. StrasserScientific Data10.1038/sdata.2015.39150039 - 2
The Tao of Open Science for EcologyJournal Article2015S.E. Hampton; S. Anderson; S.C. Bagby; C. Gries; X. Han; E. Hart; M.B. Jones; C. Lenhardt; A. MacDonald; W. Michener; J.F. Mudge; P. A; M. Schildhauer; K.H. Woo; N. ZimmermanEcospherehttp://dx.doi.org/10.1890/ES14-00402.176
Computing Location-Based Lineage from Workflow Specifications to Optimize Provenance QueriesBook Chapter2015S. Dey; S. Köhler; S. Bowers; B. LudäscherProvenance and Annotation of Data and Processes10.1007/978-3-319-16462-5_14180-1938628
The Backstage Work of Data SharingConference Paper2014K. Kervin; R.B. Cook; W.K. MichenerProceedings of the 18th International Conference on Supporting Group Work10.1145/2660398.2660406
Realizing the Value of a National Asset: Scientific DataJournal Article2014A. Wilson; R.R. Downs; C. Lenhardt; C. Meyer; W. Michener; H. Ramapriyan; E. Robinson10.1002/2014EO500006477–47895
Managing scientific data as public assets: Data sharing practices and policies among full-time government employeesJournal Article2014K. Douglass; S. Allard; C. Tenopir; L. Wu; M. Frame10.1002/asi.22988251–26265
SWEET ontology coverage for earth system sciencesJournal Article2014N. DiGiuseppe; L.C. Pouchard; N.F. Noy10.1007/s12145-013-0143-11-16
Research data management services in academic research libraries and perceptions of librarians Journal Article2014C. Tenopir; R.J. Sandusky; S. Allard; B. BirchLibrary & Information Science Research doi:10.1016/j.lisr.2013.11.003236
Evaluating a Complex ProjectBook Chapter2014S. AllardResearch Data Management: Practical Strategies for Information Professionals255
Examining data sharing and data reuse in the dataone environmentJournal Article2014A.P. MurilloProceedings of the American Society for Information Science and Technology10.1002/meet.2014.145051011551–551
Data narratives: Increasing scholarly valueJournal Article2014L. Pouchard; A. Barton; L. ZilinskiProceedings of the American Society for Information Science and Technology10.1002/meet.2014.145051010881–451
The PBase scientific workflow provenance repositoryJournal Article2014V. Cuevas-Vicenttín; P. Kianmajd; B. Ludäscher; P. Missier; F. Chirigati; Y. Wei; D. Koop; S. DeyInternational Journal of Digital Curation10.2218/ijdc.v9i2.33228–389
DMPTool 2: Expanding Functionality for Better Data Management PlanningJournal Article2014C. Strasser; S. Abrams; P. CruseInternational Journal of Digital Curation10.2218/ijdc.v9i1.319324–3309
Next Steps for Citizen ScienceJournal Article2014R. Bonney; J.L. Shirk; T.B. Phillips; A. Wiggins; H.L. Ballard; A.J. Miller-Rushing; J.K. ParrishScience10.1126/science.12515541436–1437343
Dmptool: Guidance and Resources for Your Data Management Plan; https://dmp.cdlib.org/Journal Article2014M. MalleryTechnical Services Quarterly10.1080/07317131.2014.875394197-19931
Constructing the Role of School Librarians in the 21st Century Workforce: Implications of NSF-Funded DataONE for K-12 LibrarianshipJournal Article2014K. Douglass; D. BilaliConference 2014 Proceedings
DataUp: A tool to help researchers describe and share tabular dataJournal Article2014C. Strasser; J. Kunze; S. Abrams; P. CruseF1000Research

Scientific datasets have immeasurable value, but they lose their value over time without proper documentation, long-term storage, and easy discovery and access. Across disciplines as diverse as astronomy, demography, archeology, and ecology, large numbers of small heterogeneous datasets (i.e., the long tail of data) are especially at risk unless they are properly documented, saved, and shared. One unifying factor for many of these at-risk datasets is that they reside in spreadsheets. In response to this need, the California Digital Library (CDL) partnered with Microsoft Research Connections and the Gordon and Betty Moore Foundation to create the DataUp data management tool for Microsoft Excel. Many researchers creating these small, heterogeneous datasets use Excel at some point in their data collection and analysis workflow, so we were interested in developing a data management tool that fits easily into those work flows and minimizes the learning curve for researchers. The DataUp project began in August 2011. We first formally assessed the needs of researchers by conducting surveys and interviews of our target research groups: earth, environmental, and ecological scientists. We found that, on average, researchers had very poor data management practices, were not aware of data centers or metadata standards, and did not understand the benefits of data management or sharing. Based on our survey results, we composed a list of desirable components and requirements and solicited feedback from the community to prioritize potential features of the DataUp tool. These requirements were then relayed to the software developers, and DataUp was successfully launched in October 2012.

SemantEco: A semantically powered modular architecture for integrating distributed environmental and ecological dataJournal Article2014E.W. Patton; P. Seyed; P. Wang; L. Fu; J. Dein; S. Bristol; D.L. McGuinnessFuture Generation Computer Systems

Abstract We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of linked open data. In previous work, we designed and implemented a semantically enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. In this work, we discuss SemantEco’s new architecture that supports modular extensions and makes it easier to support additional domains. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. We conclude with a discussion of how, through the application of semantic technologies, modular designs will make it easier for resource managers to bring in new sources of data to support more complex use cases.

http://dx.doi.org/10.1016/j.future.2013.09.017430 - 44036
UV-CDAT: Analyzing Climate Datasets from a User's PerspectiveMagazine Article2013E. SantosComputing in Science and Engineering194 - 10315
Ontological Empowerment: Sustainability via OwnershipConference Paper2013J. Greenberg; A. Murillo; J.K. Kunze

Positive impacts associated with urban housing/home ownership programs motivate us to study this topic in relation to ontologies. This paper reviews ontological dependence and presents early work underway in the DataONE Preservation and Metadata Working Group (PAMWG) to collectively leverage existing metadata schemes and ontologies. The paper introduces a high-level set of functional requirements and the stackoverflow model that may be used detect highly rated metadata or ontological properties to from a loose cannon for describing scientific data. The long term goal is to establish community identity and rhythm supporting a sustainable ontology/metadata driven workflow.

Big data and the future of ecologyJournal Article2013S.E. Hampton; C.A. Strasser; J.J. Tewksbury; W.K. Gram; A.E. Budden; A.L. Batcheller; C.S. Duke; J.H. Porter10.1890/1201033156 - 16211
A Linked Science investigation: enhancing climate change data discovery with semantic technologiesJournal2013L.C. Pouchard; M.L. Branstetter; R. Cook; R. Devarakonda; J. Green; G. Palanisamy; P. Alexander; N.F. NoyEarth Science Informatics

Linked Science is the practice of inter-connecting scientific assets by publishing, sharing and linking scientific data and processes in end-to-end loosely coupled workflows that allow the sharing and re-use of scientific data. Much of this data does not live in the cloud or on the Web, but rather in multi-institutional data centers that provide tools and add value through quality assurance, validation, curation, dissemination, and analysis of the data. In this paper, we make the case for the use of scientific scenarios in Linked Science. We propose a scenario in river-channel transport that requires biogeochemical experimental data and global climate-simulation model data from many sources. We focus on the use of ontologies—formal machine-readable descriptions of the domain—to facilitate search and discovery of this data. Mercury, developed at Oak Ridge National Laboratory, is a tool for distributed metadata harvesting, search and retrieval. Mercury currently provides uniform access to more than 100,000 metadata records; 30,000 scientists use it each month. We augmented search in Mercury with ontologies, such as the ontologies in the Semantic Web for Earth and Environmental Terminology (SWEET) collection by prototyping a component that provides access to the ontology terms from Mercury. We evaluate the coverage of SWEET for the ORNL Distributed Active Archive Center (ORNL DAAC).

Academic librarians and research data services: Preparation and attitudesJournal Article2013C. Tenopir; R.J. Sandusky; S. Allard; B. BirchIFLA Journalhttp://dx.doi.org/10.1177/0340035212473089139
NSF DataNet: Curating Scientific DataJournal Article2013J. Kunze; S. Choudhury
Automatic Tag Recommendation for Metadata Annotation Using Probabilistic Topic ModelingConference Paper2013S. Tuarob; L.C. Pouchard; L. GilesProceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries10.1145/2467696.2467706
Data reuse and scholarly reward: understanding practice and building infrastructureJournal Article2013T.J. Vision; H.A. PiwowarPeerJ PrePrints
D-PROV: extending the PROV provenance model with workflow structureConference Paper2013P. Missier; S.C. Dey; K. Belhajjame; V. Cuevas-Vicenttín; B. LudäscherTaPP
The DMPTool and DataUp: Helping Researchers Manage, Archive, and Share their DataConference Paper2013C. Strasser; P. CruseResearch Data Management Implementations Workshop
Participatory design of DataONE—Enabling cyberinfrastructure for the biological and environmental sciencesJournal Article2012W.K. Michener; S. Allard; A.E. Budden; R. Cook; K. Douglass; M. Frame; S. Kelling; R.J. Koskela; C. Tenopir; D.A. VieglaisEcological Informatics10.1016/j.ecoinf.2011.08.007Sep 201211
Data-intensive science applied to broad-scale citizen scienceJournal Article2012W.M. Hochachka; D. Fink; R.A. Hutchinson; D. Sheldon; W.K. Wong; S. KellingTrends in Ecology & Evolution10.1016/j.tree.2011.11.006
Ecoinformatics: supporting ecology as a data-intensive scienceJournal2012W.K. Michener; M.B. JonesTrends in Ecology & Evolution

Ecology is evolving rapidly and increasingly changing into a more open, accountable, interdisciplinary, collaborative and data-intensive science. Discovering, integrating and analyzing massive amounts of heterogeneous data are central to ecology as researchers address complex questions at scales from the gene to the biosphere. Ecoinformatics offers tools and approaches for managing ecological data and transforming the data into information and knowledge. Here, we review the state-of-the-art and recent advances in ecoinformatics that can benefit ecologists and environmental scientists as they tackle increasingly challenging questions that require voluminous amounts of data across disciplines and scales of space and time. We also highlight the challenges and opportunities that remain.

DataONE: Facilitating eScience through CollaborationJournal Article2012S. AllardJournal of eScience Librarianship

Objective: To introduce DataONE, a multiinstitutional, multinational, and interdisciplinary collaboration that is developing the cyberinfrastructure and organizational structure to support the full information lifecycle of biological, ecological,
and environmental data and tools to be used by researchers, educators, and the public at large.
Setting: The dynamic world of data intensive science at the point it interacts with the grand challenges facing environmental sciences.
Methods: Briefly discuss science’s “fourth paradigm,” then introduce how DataONE is being developed to answer the challenges presented by this new environment. Sociocultural perspectives are the primary focus of the discussion.
Results: DataONE is highly collaborative. This is a result of its cyberinfrastructure architecture, its interdisciplinary nature, and its organizational diversity. The organizational structure of an agile management team, diverse leadership team, and
productive working groups provides for a successful collaborative environment where substantial contributions to the DataONE mission have been made by a large number of people.
Conclusions: Librarians and information science researchers are key partners in the development
of DataONE. These roles are likely to grow as more scientists engage data at all points of the data lifecycle.

Ecological data in the Information AgeJournal Article2012S.E. Hampton; J.J. Tewksbury; C.A. StrasserFrontiers in Ecology and the Environment10.1890/1540-9295-10.2.59259 - 5910
Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance RepositoryJournal Article2012P. Missier; B. Ludäscher; S. Dey; M. Wang; T. McPhillips; S. Bowers; M. Agun; I. AltintasInternational Journal of Digital Curation10.2218/ijdc.v7i1.22117
DataONE: A Distributed Environmental and Earth Science Data Network Supporting the Full Data Life CycleConference Paper2012R. Cook; W.K. Michener; D.A. Vieglais; A.E. Budden; R.J. KoskelaEGU General Assembly Conference Abstracts
The future of citizen science: emerging technologies and shifting paradigmsJournal Article2012G. Newman; A. Wiggins; A. Crall; E. Graham; S. Newman; K. CrowstonFrontiers in Ecology and the Environment6298 - 30410
Exploring the Motive for Data Publication in Open Data Initiative: Linking Intention to ActionConference Paper2012D.S. Sayogo; T.A. Pardo2012 45th Hawaii International Conference on System Sciences (HICSS)10.1109/HICSS.2012.271
Exploring the determinants of scientific data sharing: Understanding the motivation to publish research dataJournal Article2012D.S. Sayogo; T.A. PardoGovernment Information Quarterly10.1016/j.giq.2012.06.011
Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best PracticeJournal Article2012R. Littauer; K. Ram; B. Ludäscher; W.K. Michener; R.J. KoskelaInternational Journal of Digital Curation10.2218/ijdc.v7i2.23227
The fractured lab notebook: undergraduates and ecological data management training in the United StatesJournal Article2012C.A. Strasser; S.E. HamptonEcosphere10.1890/ES12-00139.112art1163
Citizen science comes of ageJournal Article2012S. HendersonFrontiers in Ecology and the Environment10.1890/1540-9295-10.6.2836283 - 28310
The history of public participation in ecological researchJournal Article2012A. Miller-Rushing; R. Primack; R. BonneyFrontiers in Ecology and the Environment10.1890/1102786285 - 29010
The current state of citizen science as a tool for ecological research and public engagementJournal Article2012J.L. Dickinson; J. Shirk; D. Bonter; R. Bonney; R.L. Crain; J. Martin; T. Phillips; K. PurcellFrontiers in Ecology and the Environment10.1890/1102366291 - 29710
Insects and plants: engaging undergraduates in authentic research through citizen scienceJournal Article2012K. Oberhauser; G. LeBuhnFrontiers in Ecology and the Environment10.1890/1102746318 - 32010
From Caprio's lilacs to the USA National Phenology NetworkJournal Article2012M.D. Schwartz; J.L. Betancourt; J.F. WeltzinFrontiers in Ecology and the Environment10.1890/1102816324 - 32710
Academic libraries and research data services: Current practices and plans for the futureReport2012C. Tenopir; B. Birch; S. AllardAcademic libraries and research data services: Current practices and plans for the future
Documenting and Sharing Scientific Research over the Semantic WebConference Paper2012A. Gándara; N. Villanueva-RosalesProceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies10.1145/2362456.2362480
DataUp: Further Development and Community BuildingWeb Article2012P. Cruse; C. Strasser; W. Michener; J. Kunze; D. VieglaiseScholarship CDL Staff Publications2012
DataONE Member Node Pilot Integration with TeraGrid?Conference Paper2011N.C. Dexter; J.W. Cobb; D.A. Vieglais; M.B. Jones; M. LoweConference Proceedings of the 2011 TeraGrid Conference on Extreme Digital Discovery - TG '1110.1145/201674110.1145/2016741.20167561
Challenges and Opportunities of Open Data in EcologyJournal Article2011O.J. Reichman; M.B. Jones; M.P. SchildhauerScience10.1126/science.11979626018703 - 705331
Data archiving is a good investmentJournal Article2011H.A. Piwowar; T.J. Vision; M.C. WhitlockNature10.1038/473285a7347285 - 285473
Data Sharing by Scientists: Practices and PerceptionsJournal Article2011C. Tenopir; S. Allard; K. Douglass; A.U. Aydinoglu; L. Wu; E. Read; M. Manoff; M. FramePLoS ONE10.1371/journal.pone.002110166
Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science ProjectConference Paper2011S. Kelling; J. Yu; J. Gerbracht; W.K. Wong2011 IEEE Seventh International Conference on e-Science Workshops (eScienceW)10.1109/eScienceW.2011.1320 - 27
Understanding the Capabilities and Critical Success Factors for Scientific Data Sharing in DataONE Collaborative NetworkConference Paper2011D.S. Sayogo; T.A. PardoProceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times 10.1145/203755610.1145/2037556.2037568
Exploring the determinants of publication of scientific data in open data initiativeConference Paper2011T.A. Pardo; D.S. SayogoProceedings of the 5th International Conference on Theory and Practice of Electronic Governance

This research provides a preliminary analysis of determinants of the likelihood of researchers to publish their research datasets online. The data is derived from a preliminary survey conducted as part of the DataONE project; an international federated data repository of ecological data. The survey of 1,329 researchers was conducted by the Usability and Assessment Working Group of DataONE from October 2009 to July 2010. The analysis of the data is threefold, namely: visualization of a 2-mode network, descriptive statistics, and ordered logistic regression. The visualization of the affiliated network shows a disconnected access pattern. With the majority of researchers accessing one database and only a few connecting or accessing more than one database. The results of the survey using descriptive and inferential statistics pointed at two key determinants of publishing research datasets online, namely: data management and attribution to the datasets owner. The importance of data management manifests on two ways, the significant of data management skills and organizational support for data management.

Dataone: Data observation network for earth-preserving data and enabling innovation in the biological and environmental sciencesJournal Article2011W. Michener; D. Vieglais; T. Vision; J. Kunze; P. Cruse; G. JanéeD-Lib Magazine317
A method to track dataset reuse in biomedicine: Filtered GEO accession numbers in PubMed CentralJournal Article2010H.A. PiwowarProceedings of the American Society for Information Science and Technology

Reusing research data has important potential benefits: generative science and efficient resource use. Tracking the reuse of research datasets would allow us to understand whether the potential benefits are indeed realized, enable recognition of investigators who produce, annotate, and share useful data, and inform data sharing and reuse initiatives, tools, and policies.

Unfortunately, the lack of clear attribution practices for data make automated tracking of data reuse difficult. I present a method for tracking research data reuse that takes advantage of the community norms around gene expression microarray data sharing and the rich NCBI Entrez resources. Specifically, the full-text of papers stored in PubMed Central are queried for accession numbers of datasets archived in NCBI's Gene Expression Omnibus (GEO) repository. Studies known to have created microarray data are excluded through automated filters and guided manual curation. MeSH terms attached to the data creation and data reuse studies provide additional information for analysis. Finally, I extrapolate the findings to all of PubMed.

Automated portions of this method have been implemented in python and are openly available. Although imperfect, this dataset is a valuable initial resource for research into patterns of data reuse.

Quantitatively Evaluating Data Citation and Sharing Policies in the Earth SciencesPresentation2010N. Weber