Metadata — data that describe other data — accompany every scientific data set, and their richness, quality, and interoperability determine how well data sets can be found, understood, and re-used. A number of standards are used to organize metadata, and their structural differences create difficulties for data sharing and other collaborative efforts. Metadata crosswalks remove the barrier to interoperability by providing a map between one standard and another. Crosswalks are difficult to generate because they often consist of too many terms to map by hand, and are written in natural language that cannot be effectively processed by software alone. I have written a program with which a user can construct and evaluate processing algorithms without the need to write new code. This tool allows users to quickly compare different algorithms which are suited to build crosswalks between the metadata standards that are relevant to their work
Primary Mentor: Bruce Wilson
Secondary Mentor: Dave Vieglais, Giri Palanisamy
EarthGrid is a lightweight Application Programming Interface (API) for web-based communication with diverse environmental data systems. Until now, it was only supported through the SOAP protocol, which involves a significant overhead for client programmers and suffers from uneven support across programming languages. This project involved refactoring current SOAP-based Earthgrid API to a REST-based architecture. REST exploits the existing technology and protocols of the Web, while decoupling the implementation of a system from its interface. I implemented a prototype within the Metacat data management system to demonstrate the effectiveness of this architecture. The results of this project will serve as a basis for the first Virtual Data Center REST API, which is currently being designed.
Primary Mentor: Hilmar Lapp
Secondary Mentor: Rutger Vos
Scientific web services produce and consume data in many different, loosely defined formats and messaging protocols. These heterogeneities make it especially difficult for services to be discovered and used in applications. This project aimed to address this problem by mapping components of a web service interface, such as input and output data elements and operations, to concepts in a well-defined and structured semantic model (i.e. ontology). This gives prospective consumers of a web service access to the service’s functionality even if they are unfamiliar with the specific syntax of its service interfaces. To demonstrate the advantages of attaching semantic meaning to scientific web services, I implemented a rudimentary proof of concept service by utilizing the latest versions of the Web Services Description Language (WSDL 2.0) and Semantic Annotations for WSDL (SAWSDL) standard recommendations, using the data exchange (NeXML), data semantics (CDAO), and data access (phyloWS) standards for phylogenetics recently developed by the NESCent EvoInfo Working Group.
Primary Mentor: Matt Jones
Secondary Mentor: Mark Sevilla
My project consisted of implementing a tool that helps with mapping semantically similar terms between metadata vocabularies, which is important when integrating or linking data sets annotated by different and typically incompatible metadata standards. The tool I created takes two sets of terms and their descriptions as input, and returns a ranked list of matching candidate terms from the two sets. For implementation, I took advantage of the Lucene indexing system, and the General Text Matcher libraries, and I wrote parsers for different metadata standards, including Dublin Core, Darwin Core, and the Ecological Metadata Language.
Primary Mentor: Dave Vieglais
Secondary Mentor: Bruce Wilson