data mining

Velocity is a Java-based template engine. Its template language references objects defined in Java code. When Velocity is used for web development, Web designers can work in parallel with Java programmers to develop web sites.

Velocity has broader uses, such as generation of SQL, PostScript and XML from templates. It can be used either as a standalone utility for generating source code and reports, or as an integrated component of other systems.

iMacros was designed to automate the most repetitious tasks on the web. With iMacros, you can quickly fill out web forms, remember passwords, create a webmail notifier, download information from other sites, scrape the Web (get data from multiple sites), and more. You can keep the macros on your computer for your own use, or share them with others by embedding them on your homepage, blog, company Intranet or any social bookmarking service.

Mathematica is a computational platform used by scientists, engineers and mathematicians. Mathematica has support for equation solving, numerical analysis, as well as graphing and visualization. Mathematica has import and export filters for tabular data, images, video, sound, CAD, GIS documents and biomedical formats. There is support for data mining tools such as cluster analysis, sequence alignment and pattern matching as well as text mining support.

Microsoft SQL Server Analysis Services (SSAS) is part of Microsoft SQL Server, which is a relational database management system (RDBMS). SSAS contains online analytical processing (OLAP) and data mining functionality for business intelligence applications. Many of the business intelligence and data mining functions within SSAS are applicable to environmental datasets.

OAIster is a freely accessible search engine for open access web resources, available from OCLC. OAIster uses the Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH) to harvest records from websites. OAIster contains over 25 million records from all disciplines and subjects contributed by over 1,000 libraries, archives, and repositories. The records harvested by OAIster use Dublin Core (unqualified) metadata format.

The Outwit suite of Firefox extensions that allows you to harvest materials on the web. The suite currently includes Outwit Hub, Outwit Images, and Outwit Docs.

SAS is an integrated system of software that enables everything from data access across multiple sources to complex manipulations of data files to performance of sophisticated statistical analyses and data visualizations. Three of SAS' most popular software products that are commonly used by ecologists are Base SAS, SAS/STAT, and SAS/GRAPH. SAS is available for Windows and UNIX platforms.

SAS Enterprise Miner streamlines the data mining process to create predictive and descriptive models based on analysis of large amounts of data. Data can be accessed from local files or from remote database connections. SAS data mining software uses a point-and-click interactive interface to create workflows and analysis diagrams, and then execute them. SAS Miner can transform and manipulate data using filters and statistical analyses to extract desired data from large datasets.

Spotfire Miner is software for data mining of large datasets. It is sold commerically by TIBCO.

Users can connect to remote or local datasets, apply statistical and methodological filters, clean and transform the data, and finally apply a model to produce the desired mined data. Statistical models include clustering, regression analysis, and principal components analysis. Models based on historical data can then be used to predict future results based on newly mined data.

STATISTICA is a proprietary analytical software package developed by StatSoft that includes data visualization, data analysis, data management, and data mining tools. It is a primarily graphical user interface (GUI) application.

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox and data assimilation system that synthesizes information contained in ecological models, data, and expert knowledge. This is done using modern statistical methods and state-of-the art ecosystem models. PEcAn has a web interface that enables users to run ecosystem models, as well as a suite of R packages that can be used for model-data fusion and more sophisticated analysis.

WEKA is a data mining tool. It is a collection of standard machine learning algorithms organized and presented to the user as a workbench. The algorithms can be applied directly to a dataset from the workbench or called from Java code. New classifiers, filters etc can be added through the GUI.

WEKA is written in Java and runs on platforms that support Java. It is available under the GNU Public License (GPL).

