Metacat is a flexible, open source metadata catalog and data repository that targets scientific data, particularly from ecology and environmental science. Metacat accepts XML as a common syntax for representing the large number of metadata content standards that are relevant to ecology and other sciences. Thus, Metacat is a generic XML database that allows storage, query, and retrieval of arbitrary XML documents without prior knowledge of the XML schema.

Metacat is designed and implemented as a Java servlet application that utilizes a relational database management system to store XML and associated meta-level information. Installation of Metacat recommends the use of Apache Tomcat for servlet management and PostgreSQL as the underlying RDBMS, although other configurations are possible. Metacat provides a rich client Application Programming Interface (API) and supports a variety of languages, including Java, Python, and Perl.

Metacat is being used extensively throughout the world to manage environmental data. It is a key infrastructure component for the NCEAS data catalog, the Knowledge Network for Biocomplexity (KNB) data catalog, and for the DataONE system, among others.

Technical Expertise Required: 
Basic programming skills
Additional Information: 
  • Berkley, C., M. Jones, J. Bojilova, and D. Higgins, 2001. Metacat: A schema-independent XML database system. 13th Intl. Conference on Scientific and Statistical Database Management: 171.
  • Jones, M.B., C. Berkley, J. Bojilova, M. Schildhauer, 2001. Managing scientific metadata, IEEE Internet Computing 5(5): 59-68.
  • Metacat Administrator's Guide (http://knb.ecoinformatics.org/software/dist/MetacatAdministratorGuide.pdf)