Latest News and Events

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012


The goal of this research is to combine two areas, Visual Analytics and Bayesian Statistics. Currently, visualizations display inflexible deterministic transformations of data that inherently separate data visualization from visual synthesis. Analysts cannot manipulate displays to inject domain-specific knowledge to formally assess the merger of their expert judgment with the data. However, by changing the nature of the data transformation from deterministic to probabilistic Bayesian methods, manipulations to a display are possible to interpret quantitatively.

Viruses are contagious agents and can cause epidemics and pandemics. The importance of the prevention and control of viral epidemics and pandemics to homeland security and daily life cannot be overemphasized. Viruses cannot grow and/or reproduce outside host cells. Their infection starts with the attachment of a virus on the host cell surface, with possible fusion of viral capsid surface and the host cellular membrane, followed by virus penetration into the host cell.

The FODAVA (Foundations of Data Analysis and Visualization) Lead research team at the Georgia Institute of Technology provides unified expertise in the critical areas for providing leadership of the FODAVA effort, including machine learning and computational statistics, information visualization, massive-dataset algorithms and data structures, and optimization theory. The team is focused on the fundamental theory and approaches to make breakthroughs in data representations and transformations.

Principal Investigator(s):

The ubiquitous phenomenon of massive data (including data streams) imposes considerable challenges in data visualization and exploratory data analysis. About 15 years ago, terabyte datasets were still considered `ridiculous.' However, modern datasets managed by Stanford Linear Acceleration Center (SLAC), NASA, NSA, etc. have reached the perabyte scale or larger. Corporations such as Amazon, Wal-Mart, Ebay, and search engine firms are also major generators and users of massive data.

Modern direct manipulation and visualization systems have made key strides in bringing powerful data transformations and algorithms to the analyst's desktop. But to further promote the vision of powerful visual analytics, wherein automated algorithms and visual representations complement each other to yield new insight, we must continually increase the expressiveness with which analysts interact with data. This project focuses on the task of storytelling, that is to say the stringing together of seemingly unconnected pieces of data into a coherent thread or argument.

This is a collaborative research effort bringing together expertise of Lise Getoor, University of Maryland College Park (0937094), Alex Pang, University of California-Santa Cruz (0937073) and Lisa Singh, Georgetown University (0937070).

Over the past decade, the precipitous drop in the cost of disk storage and the build-up of world-wide high-bandwidth fiber optic communications has made massive amounts of data of different modalities (text, images,video) easily available to everyone over the Web. In science, engineering, business, and medicine, high-bandwidth sensors, large-scale simulations, and data collection bots generate immense data sets that need to be analyzed. Making sense of all this disparate data in becoming increasingly challenging and difficult.

Finding and labeling semantic patterns in large, spatial data sets is one of the most important problems facing computer scientists today. Massive spatial data sets are being acquired in almost every scientific discipline, such as medicine, geology, biology, astrophysics, and others. Finding meaningful patterns in those data is often the bottleneck to scientific discovery. The proposed research is to develop a transformative machine learning methodology, where the process of discovering semantic patterns in large spatial data sets is interactive and semi-autonomous.

As the availability and size of digital information repositories continues to burgeon, the problem of extracting deep semantic structure from high-dimensional data becomes more critical. This project addresses the fundamental problem of transfer learning, in particular it investigates methods for aligning multiple heterogeneous data sets to find correspondences and extract shared latent semantic structure.

The analysis of large high-dimensional data sets and graphs is motivated by many important applications, such as the study of databases of images and documents, and the modeling of complex dynamical systems (e.g. transaction data, weather patterns, molecular dynamics). This research involves the development of novel mathematical techniques for extracting and visualizing information from large data sets.