Kate is a Master's Candidate at the University of Denver in the field of Library and Information Science. She studied English Literature at the University of Prince Edward Island in PEI, Canada. She originally hails from the Land of Enchantment and enjoys hiking in the Rocky Mountains and the sights and sounds of the Mile High City!
The Research Data Alliance (RDA) Metadata Standards Directory Working Group (MSDWG) is developing a prototype wiki-based directory to list metadata standards applicable to scientific data. The initial emphasis is on widely-used and domain community-endorsed metadata standards and schemas with significant interoperation/re-use capability. The Digital Curation Centre (DCC) has compiled a catalog of Disciplinary Metadata Standards, use cases for these standards, as well as tools to implement the standards. This catalog can be found on the DCC Disciplinary Metadata page at (http://www.dcc.ac.uk/resources/metadata-standards). The RDA MSDWG has joined with the DCC to extend this catalog of disciplinary metadata. The MDSWG has circulated a survey requesting additional information about metadata standards; the tools and use cases associated with them, and additional information that shows where and how scientists use them worldwide. The goal of this summer project is continue work on a prototype system that would allow respondents to the survey to directly fill out a template, similar to a wiki, that standardizes the presentation of information and eliminates the need to transfer information from the survey. The prototype system will also enable updates and corrections to the catalog of disciplinary metadata. The long-term goal of the MSDWG is a community sustainable directory of metadata standards. The goals of this summer project are to continue work on a prototype catalog system that: (1) is wiki based, supporting community participation, (2) standardizes the presentation of information about metadata standards, tools, and use cases, (3) enables catalog contributions, updates, and corrections by directly filling out a template, (4) conduct a survey, building off of the fall 2013 survey, to: (a) gather new information and updates about metadata standards, tools, and uses cases (b) test the functionality of the prototype catalog.
The long-term goal of the MSDWG is a community sustainable directory of metadata standards.
Primary Mentor: Rebecca Koskela
Secondary Mentor: Jane Greenberg
Becky is a M.F.A candidate in Book Arts at the University of Alabama. She received her B.F.A in Media Arts from Pratt Institute in Brooklyn, NY in 2002. Becky's background is in documentary television working for clients including National Geographic, Discovery, and PBS. Becky's experiences include shooting and producing in locations like Svalbard, South Africa, and India. This exposure - to many cultures - continues to influence her work
The intern will be responsible for creating short videos that present stories about the real-life experiences of researchers. The videos will be based on stories that have been collected as part of DataONE’s Data Stories project, and can be accessed online at https://notebooks.dataone.org/data-stories/. The intern selected for this project will work in close collaboration with the Data Stories project leader during the concept development and review stages of the video project.
CREATIVE LICENSE The intern will have broad creative freedom regarding the style and technologies used to create the videos. Intern may wish to use online video creation tools (e.g., PowToon, Voki) or other software intern has access to.
Primary Mentor: Stacy Rebich-Hespanha
Secondary Mentor: Amber Budden
Heather Heinz holds a MSci in Biology from Villanova University with a focus on phylogeography of reptiles. An lifelong science educator, she is known for her enthusiasm and desire to spread curiosity about the natural world to students and the general public alike. When not traveling the world in search of lizards, Heather enjoys a variety of creative pursuits, including photography and writing
DataONE has developed significant tools and resources of value to the research community. Many of these have documentation associated with them, but none have readily available public demos or screencasts. As part of this project the intern will develop a process for creating screencast tutorials, including identification of appropriate software, workflow process and timeline. Screencasts for a single tool/resource will be broken down into multiple chapters and the intern will also explore appropriate timings for a positive user experience. Draft screencasts will be tested / evaluated by members of the DataONE community and the intern will coordinate this survey / feedback effort, incorporating suggestions into additional development activity. Completed screencasts will be published on the DataONE public website and uploaded to Vimeo / YouTube under a CC-0 license. Potential tools / resources include: ONEMercury (https://cn.dataone.org/onemercury/), ONE-R (http://releases.dataone.org/online/dataone_r/), DataONE Best Practices Database (http://www.dataone.org/best-practices), ONEDrive (not yet released), Ask DataONE (https://ask.dataone.org/questions/). Evaluation of suitability and prioritization of these resources will be one of the first activities of the intern in collaboration with the DataONE community. Note: It is not anticipated that all the above resources will be covered during the period of the internship.
Primary Mentor: Amber Budden
Secondary Mentor: Carly Strasser
Katie just finished her second year in the Information Technology and Web Science Masters program at Rensselaer Polytechnic in upstate New York. Her upbringing in northern Virginia, just outside of Washington, DC, meant that she had seen snow prior to this winter, but New York was still quite a change weather-wise from where she completed her undergraduate work, in St. Petersburg, Florida. There, at Eckerd College, she earned a degree in Computer Science, as well as in East Asian Studies, and Modern Languages. In her free time, she enjoys good science fiction; bad puns; watching hockey and baseball; and embarking on cooking adventures.
This project aims to make data ingest and annotation easy for a wide range of users. A Semantic Annotator was prototyped last summer to provide vision for how this can work in the DataONE environment. That tool enables earth and environmental scientists to annotate their data and link their data to relevant ontology concepts. However, users have to complete the whole process in one continuous session, and there are no security policies to protect their privacy. One way to solve the problem is to integrate user management into the Semantic Annotator to help preserve annotation results for re-use between sessions. This will make it easier to annotate data at the appropriate time and also allow multiple people to collaborate on the task. It also preserves provenance so that the systems may maintain a record of who made updates and when. The scope of this task includes:
<li>enabling user account functionality, whereby users have to register in order to have permission to use the application. Usage history data will be stored for later reuse.
<li>enabling the loading of user-specified enhancement parameters to link into existing datasets in order to reduce repetitive annotation, which will be very useful when processing datasets with same or similar metadata.
<li>enabling “semantic palette"/"my favorites" facets to preserve user's frequently used classes or properties for re-use between sessions, which will improve the efficiency of users’ annotation.
<li>implementing data access/security, so that users can control who sees the data.
Primary Mentor: Deborah McGuiness
Secondary Mentor: Xixi Luo
Yue (Robin) Liu is a first year PhD student in Computer Science at Rensselaer Polytechnic Institute. He received his master's degree in Information Technology & Web Science at Rensselaer Polytechnic Institute and joined the Tetherless World Constellation group (http://tw.rpi.edu/) after graduation. His research interest includes linked data application, scientific/web data analysis and management and machine learning.
This project aims to make data ingest and annotation easy for a wide range of users. A Semantic Annotator was prototyped last summer to provide a vision of how this would work. That tool provided a predefined set of ontologies including OBO-E, PROV-O and others into which earth and environmental scientists could link their data. It also enabled users to load and use additional ontologies. This functionality required users to be familiar with the specific ontologies they intended to use. However, there are some situations in which users may wish to discover relevant ontologies in the process of creating annotations. Integrating ontology search and recommendation into the Semantic Annotator can help to achieve this goal. The scope of this task includes: (1) enabling Ontology Search (internal or external), integrated with external services such as the search feature of Linked Open Vocabularies, to enable user search based on keywords in order to find appropriate classes, properties, etc. (2) enabling the "individuals" facet, which allows the user to click on a class in order to see instances of the class (3) enabling ontology recommendation based on usage history. If a dataset has the same or similar metadata as a previous dataset, then the previously discovered or used ontologies will be recommended to the current user.
Primary Mentor: Deborah McGuinness
Secondary Mentor: Xixi Luo
James Michaelis is a PhD candidate at Rensselaer Polytechnic Institute. His research focuses on the evaluation of interface usability for provenance querying systems, as well as corresponding development of methods for measuring interface usability. He holds an M.S. in Computer Science from Rensselaer Polytechnic Institute and a B.S. in Computer Science from the University of Colorado at Boulder.
This project aims to enable and provide provenance tracing during the access of science data using the OPeNDAP Hyrax software stack. The provenance provided will express the various OPeNDAP Hyrax components that have been used to generate data products that will be used by scientists and researchers. In addition to providing provenance for the data access component, scientists and researchers will also have the ability to ping-back (using the W3C provenance working group recommendation) to the repository in order to provide attribution and citation of the data products and originating data sets used in publications and papers. The successful candidate will complete the development of software by the deadline, tested the software with DataONE sites using OPeNDAP Hyrax for data access to ensure the robustness of the resulting work, and document the work that they have completed by writing implementation documentation and blogs related to their experience in the project. The candidate will also assist with any related publications.
Primary Mentor: Patrick West
Secondary Mentor: Deborah McGuinness
Yan Gao is a second year master students in Information Science at University of Texas at Austin. He has a bachelor degree in Information Management and Information Systems from Beijing University of Aeronautics and Astronautics. His research interests include, but are not limited to, data mining, data visualization, human computation and crowdsourcing, information retrieval. Now he is preparing for applying a PhD degree that focuses on data mining and machine learning.
Numerical modeling has become an important technique for extrapolating local observations and understanding of the Earth system and climate change, to larger spatial and temporal regions. For example, terrestrial biosphere models (TBMs) are crucial tools in further understanding of how terrestrial carbon is stored and exchanged with the atmosphere across a variety of spatial and temporal scales. But due to the complexity of the Earth system, TBMs estimates vary widely. Regional and global scale TBMs run simulations at millions of locations at thousands of time steps. They require not only extensive computing resources (e.g. supercomputers), but also generate terabytes of output data. Inter-comparison among different TBMs estimates and comparing them against observations are important to find model-model and model-observation agreements/disagreements, which can further provide feedbacks to the modeling community for model skills improvements.
Analyzing such a huge amount of data is a typical big data challenge. The DataONE Exploration, Analysis, and Visualization (EVA) working group has been leveraging advanced big data exploration and visualization techniques to tackle this challenge and started the design and development of SimilarityExplorer. The SimilarityExplorer tool leverages multi-dimensional projection and synchronized spatiotemporal correlation techniques to allow people to conveniently explorer and visualize the similarity/difference among complex multi-dimension, multi-scale, and multi-variable environmental data and how the similarity/difference change across regions and along time.
Primary Mentor: Bob Cook
Secondary Mentor: Yaxing Wei
Heejun Kim is a PhD student in the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill. He has a MS degree in Geographic Information Science (GIS) from the University of Illinois at Urbana-Champaign. His research interests are in information extraction from user-generated content, credibility assessment in participatory environment, geographic information and disaster management. In his spare time, Heejun likes swimming, cooking, and playing with his two sons.
Citizen science is a novel “instrument” that can gather data unavailable to scientists using traditional methods. Recent advances in internet technologies and mobile computing are accelerating the possibilities to engage the public in participating in scientific research. Projects such as eBird and CoCoRaHS are providing data that is proving exceptionally to scientists. Nevertheless both the scientific community and the public remain skeptical about the quality of citizen science data because citizens by definition are not “certified” scientists. To overcome this hurdle citizen science projects use variety of approaches to improve and document data quality. The candidate will mine the web and use a literature review process (perhaps combined with web surveys and/or personal interviews) to document citizen participation, data collection procedures, and analysis tasks undertaken by citizens. Building on the efforts of Wiggins et al 2011(Mechanisms for Data Quality and Validation in Citizen Science), the candidate will document methods to improve and document data quality. One goal will be to define data models and data quality models that can guide citizen science projects in the future. A second goal will be to better understand and define the essential tension between improving the data quality (having citizens work like scientists) and keeping the citizens engaged in the project.
Primary Mentor: Rob Stevenson
Secondary Mentor: Greg Newman
Yurong He is a Ph.D. candidate at the University of Maryland iSchool and Human-Computer Interaction Lab. She is also a research assistant working for the Biotracker project (http://biotracker.umd.edu/). Her research focuses on studying the collaborative effort between scientists and citizen scientists in advancing the knowledge of biodiversity in online environment. In her spare time, she enjoys spending time in nature and having delicious food with her family and friends.
Citizen science is a novel “instrument” that can gather data unavailable to scientists using traditional methods. Recent advances in internet technologies and mobile computing are accelerating the possibilities to engage the public in participating in scientific research. Projects such as eBird and CoCoRaHS are providing data that is proving exceptionally to scientists. Nevertheless both the scientific community and the public remain skeptical about the quality of citizen science data because citizens by definition are not “certified” scientists. To overcome this hurdle citizen science projects use variety of approaches to improve and document data quality. The candidate will mine the web and use a literature review process (perhaps combined with web surveys and/or personal interviews) to document citizen participation, data collection procedures, and analysis tasks undertaken by citizens. Building on the efforts of Wiggins et al 2011 (Mechanisms for Data Quality and Validation in Citizen Science), the candidate will document methods to improve and document data quality. One goal will be to define data models and data quality models that can guide citizen science projects in the future. A second goal will be to better understand and define the essential tension between improving the data quality (having citizens work like scientists) and keeping the citizens engaged in the project.
Primary Mentor: Rob Stevenson
Secondary Mentor: Greg Newman
Tianhong Song is a PhD candidate in Computer Science at University of California, Davis. He received his BS in Applied Biological Science and Enterprise Management at Zhejiang University. His research interests include workflow design and analysis, large scale data analysis and integration, algorithm design and optimization in general
Capturing and analysis of provenance from scientific workflow environments and databases is well studied and understood. Scientists can employ provenance information to better understand, debug, and document their findings, and thus greatly simplify and enhance reproducible science. Digital notebooks such as iPython can be understood as a new way of marrying ideas from high-level scripting, interactive workflows, and even “executable papers”: similar to the idea of Literate Programming, the digital notebook combines both documentation (the paper) and the code to produce the results. Thus, by design, they are self-documenting and greatly enhance transparency and reproducibility. However, the provenance models used for digital notebooks are less well studied than those for databases and workflow systems. Capturing the provenance of data obtained during multiple interactive sessions is therefore one of the enablers of emerging, dynamic models of scholarly communication. Based on existing and to-be-developed notebooks, the project will investigate the potential for provenance capture in these environments, identifying technical challenges and assessing both complexities and opportunities. Furthermore, it will explore the modeling of provenance in such contexts and the adaptation of existing querying and analysis techniques
Primary Mentor: Bertram Ludäscher
Secondary Mentor: Paolo Missier