Algorithm Visualizations

•

Monte Carlo search algorithm for network inference

#1 (Summer, 2008): Visualization of Monte Carlo optimization for network inference

This Monte Carlo search algorithm searches through network structures by adding or removing one connection at a time in a systematic fashion. The algorithm executes step by step and the corresponding change is visualized in one of the four windows on the left showing:

(a) the best observed network

(b) the newest network being tried and its optimal parameter configuration

(d) the the network selected to replace the best observed network.

The goal is to link the steps of the algorithm (right) to an intuitive visual analog (left) and is intended to supplement understanding of the algorithmic procedure. This movie is not meant to be an exhaustive explanation of each step in the process. I hope that repeated viewing should provide an intuition for how the algorithm works and how the solution evolves over time. If this display were linked to an ongoing simulation and rendered in real time, it may have a real utility in aiding researchers in observing the convergence behaviors of this search on a given data set.

DESCRIPTION

The animation consists of 4 windows on the left, which visualize the status of the inference, and the algorithm explicitly written on the right. The top left window shows the best searched network topology; the point-of-departure topology. The top right window shows the various changes to the point-of-departure topology that are being sampled in the Monte Carlo search. The color of the arrows in this window correspond to the sign and magnitude of the interaction that is most optimum given the fixed topology and a data set. Each network is associated with a probability that is inversely proportional to the error (or fitness with the data), and this probability is plotted as a bar plot in the bottom left window. Finally, after N steps through the search, the Monte Carlo chooses one of those recently searched network topologies according to their probability. The choice is indicated with a red bar over the corresponding model index in the same window, and then the network topology is displayed in the bottom right window. The algorithm proceeds by looping through this iterative search and select method.

Belief Propagation algorithm for network inference

#2 (Summer, 2010): Visualization of one iteration of the Belief Propagation algorithm

Belief Propagation (BP) is an iterative algorithm for calculating probability distributions (also called marginals) for network parameters that represent the absence of presence of an interaction between nodes. The goal of this movie is to give the viewer an intuition about the iterative process, the convergence properties, the independent contribution of each factor (here labeled as experiments) to the marginals, and show the overall organization of this BP algorithm. The movie also gives the viewer a sense of coverage of the solution space before convergence.

DESCRIPTION

The substance of this movie is the result of the BP algorithm applied to infer network interactions (parameter values) on a small system of 5 nodes (variable nodes) from 10 training patterns or experiments. Each small plot is a probability distribution on a single parameter as constrained by the data in a single training pattern, also called a factor or message in the message passing community. The product over all factors is the marginal distribution, plotted on the bottom row. Throughout the movie, the factors and the corresponding marginals change, sometimes varying vastly from the true parameter value, until ultimate convergence around the true parameter.

#3 (Summer, 2011): Schematic illustration of the Belief Propagation update

Belief Propagation (BP) is an iterative algorithm for calculating probability distributions (also called marginals) for network parameters that represent the absence of presence of an interaction between nodes. The iteration consists of many update calculations to a single marginal by way of altering a single factor. A factor is the independent contribution to the marginal for a single parameter based on the data in a single experiment or training pattern. This figure is a summary of how the update calculation of BP works.

The primary goal of this figure is to accompany the symbols and equations with a simple and recognizable visual analog, so that instead of thinking about x's, y's, indices, sums and products one can visualize distributions, arrows and colors. In this sense, this is a sort of visual glossary for the otherwise abstract symbols and equations. A secondary goal is to provide the viewer with context for how each factor is related to the other factors even though they are independent contributions to the overall marginal. For these reasons, it is imperative that all visual cues and associates are entirely self-consistent. Examples of this self-consistency include (a) shades of pink are related to probability distributions (b) the blue curve on the top is the same blue curve on the bottom, and (c) the non-zero weights are consistently matched to the j-indices.

In the end, I hope the viewer sees how factors come together to form beliefs about the global system, are then used to update one factor, which in future iterations, is part of the global system. It is in this respect that this algorithm is a propagation of beliefs until a maximum set of globally consistent beliefs are reached.

DESCRIPTION

The top half is a summary of the global information and its preparation for use in the update. The bottom half is the update to one factor based on the global information. The update is performed for a single factor, which is specific to one parameter (hexagon) and the data in a single experiment (square), denoted the cavity parameter and condition, respectively.

In the top, big arrows point from model parameters (blue hexagons) to individual experimental conditions (tan boxes). These arrows indicate that the algorithm is, in some sense, receiving the information contained by the model parameters. In this case, the form of this information is the aggregate probability distribution for each of the model parameters where the contribution from the cavity condition is excluded. All of these distributions are bundled together into a mean-field approximation and turns into a single distribution (blue curve) describing the likely affect from all the other model parameters or interactions.

In the bottom, the arrow points from the cavity condition to the cavity parameter, indicating that the cavity parameter will be updated to be constrained to the data in the cavity condition. The update is a replacement of the old factor distribution with a new one. Each possible assignment of the cavity parameter is given a score (shade of red) that is proportional to the overlap of two distributions; the first is the mean-field (blue curve), and the other is the fitness (green curve) of this parameter assignment. The larger the intersection, the more consistent the parameter assignment is with the rest of the system (blue curve). One can think of the green curve as the extent to which any mean-field value fits the data and the parameter assignment. The blue curve is the probability that the system is likely to take on a mean-field value.

Data Exploration Tool for analyzing inferred network models

#4 (Winter, 2011-2012): Exploratory data analysis of network analysis

Most visual data analysis is concerned with communicating a result. Another important but under-appreciated goal of visualization is to explore and assist a researcher in finding the result in the first place, which is sometimes called exploratory data analysis. For this project, I worked on a tool with a coworker to develop a visual and interactive tool to look at many performance features of many network models over a single node space.

A demo of this application is available for download (here)

The scientific motivation for this project was to evaluate the results of our network inference pipeline, which can generate hundreds of unique network models of protein response to drug perturbations. In short, we wanted to generate a simple, aesthetic and interactive tool for quick hypothesis generation and testing. In addition to interpreting results, we hope that this tool could develop intuition about the strengths and weaknesses of this algorithm and ultimately lead to improvements. Although the scope of this tool is restricted to interpreting one kind of model generated from a single algorithm, one could incorporate other model types into this kind of tool.

DESCRIPTION

The tool consists of two interactive, linked windows dedicated to showing model performance features and model connectivities, respectively. Selecting a subset of models in the first window, will automatically overlay all member models in the other. The model feature window is a parallel coordinate plot, where each model is associated with a single score for each feature, and each feature is plotted on consecutive parallel axes. Lines connect each model's score across all features, and the collection of such lines can visually bring out trends in feature space. This window has several interactive options including the ability to add and delete features, rearrange axes, color the lines by the score of any axis and highlighting user-defined subsets. The model viewer window simultaneously visualizes all user-selected models using a unique overlay arrangement. Users can rearrange the nodes in this window.

Conceptual animation of the bump hole strategy

# 5 (Summer, 2012): 3D Animation of bump-hole concept

This animation was conceived by a graduate student in a chemistry lab at Memorial Sloan Kettering Cancer Center. The goal of this animation is to introduce the research problem and conceptually demonstrate the solution. This animation is not a result and is exclusively designed for communication of a concept.

The end goal of this lab is to track the substrate of a specific enzymatic reaction. The general strategy is to add an alkyne chain to a co-factor, which then binds to the enzyme and transfers the alkyne chain to the substrate. The problem is that the co-factor with the alkyne chain is no longer able to bind to the enzyme because the alkyne chain does not fit into the active site. The solution that we are attempting to conceptually communicate is the introduction of a mutation to one residue near the active site of the enzyme that does not alter function but makes room for the co-factor plus alkyne chain. The mutated enzyme then catalyzes the reaction, the alkyne chain is transferred to the substrate and the end goal of tracking the substrate is now possible.

The client requested some aesthetic appeal to keep the attention of a general scientific audience. He also requested using scientifically accurate representations where appropriate, and to avoid misleading representations of scientific accuracy when illustrating points.

Aesthetic features:

reflections: the scene includes a reflective ground surface
background and lighting: the scene does not occur in empty space.

panning cameras: moving around in the scene gives the viewer a sense of spatial exploration and dynamics

transparency: some objects in the scene are hidden with transparency to draw the viewer's attention to other objects; other times, transparency produces a window for the viewer to see things that would otherwise be hidden.

glow effect: the alkyne chain glows to connect it to its function as a tag; the enzyme glows to represent a reaction.

the big bang: the complex disassembles and the tagged substrate flies into the camera to give the viewer a sense of action.

Scientific considerations:

molecular structure: all three molecules are accurate representations from real pdb data files; the co-factor plus alkyne chain were manually constructed based on known coordinates for all atoms.

binding properties: the binding representation is entirely conceptual and does not reflect any measured kinetics or conformational changes; we utilize the idea of mutually exclusive space occupancy to conceptually indicate that the alkyne chain does not fit into the active site of the enzyme, although it is not scientifically accurate; no energy considerations were taken into account.

mutation: the exact tyrosine to glycine mutation is not depicted here; rather we exaggerate the change in size of the binding pocket to make it clear that the mutation changes the structure of the enzyme to accommodate the co-factor and alkyne chain.

Algorithm Visualizations

Published: January 18th 2013

Algorithm Visualizations

Owner

Algorithm Visualizations

Movies and images for visualizing algorithms.

Tools

Creative Fields