You are here
Similarity Explorer: Inspire Climate Science Discovery Through Advanced Big Data Analysis
Yan Gao is a second year master students in Information Science at University of Texas at Austin. He has a bachelor degree in Information Management and Information Systems from Beijing University of Aeronautics and Astronautics. His research interests include, but are not limited to, data mining, data visualization, human computation and crowdsourcing, information retrieval. Now he is preparing for applying a PhD degree that focuses on data mining and machine learning.
Numerical modeling has become an important technique for extrapolating local observations and understanding of the Earth system and climate change, to larger spatial and temporal regions. For example, terrestrial biosphere models (TBMs) are crucial tools in further understanding of how terrestrial carbon is stored and exchanged with the atmosphere across a variety of spatial and temporal scales. But due to the complexity of the Earth system, TBMs estimates vary widely. Regional and global scale TBMs run simulations at millions of locations at thousands of time steps. They require not only extensive computing resources (e.g. supercomputers), but also generate terabytes of output data. Inter-comparison among different TBMs estimates and comparing them against observations are important to find model-model and model-observation agreements/disagreements, which can further provide feedbacks to the modeling community for model skills improvements.
Analyzing such a huge amount of data is a typical big data challenge. The DataONE Exploration, Analysis, and Visualization (EVA) working group has been leveraging advanced big data exploration and visualization techniques to tackle this challenge and started the design and development of SimilarityExplorer. The SimilarityExplorer tool leverages multi-dimensional projection and synchronized spatiotemporal correlation techniques to allow people to conveniently explorer and visualize the similarity/difference among complex multi-dimension, multi-scale, and multi-variable environmental data and how the similarity/difference change across regions and along time.