Visualizer Implemented for Understanding Pleiades InfiniBand Behaviors
05.28.09
The NAS visualization team has developed and implemented a utility to provide a visual overview of Pleiades' complex hypercube InfiniBand (IB) fabric—the largest built to date.
Currently in a preliminary stage and using static analysis, the visualizer will help systems engineers understand behaviors of the network and router structure, make informed decisions for potential
improvements, and observe the results of any changes made to the infrastructure. A more efficient InfiniBand interconnect will speed up computations, reduce cross-job overlap, and allow for better
data movement to the Lustre filesystem and hyperwall-2 visualization system.
Snapshot from visualization utility shows one plane of the (partial) 10-D hypercube InfiniBand (IB) fabric on the Pleiades supercomputer. Nodes (dots), representing switches, are colored by
degree (number of connections). Green and red edges are inter-switch IB links, with red signifying dimension 1 (port 9). Magenta edges show all routes in a hypothetical 16-switch all-to-all
communication, and their saturation indicates the number of routes sharing the link (illustrating "self-overlap").
The utility interprets raw data from the IB switches and fabric manager; reads the data's individual link output; reconstructs and draws the overall 10-D topology from the resulting collection of
individual links; and colors the nodes by the number of connections. The visualization team also wrote the graphics code that displays all of the above, allows various queries and manipulations (e.g., selective
highlighting of each dimension of the cube) and supports high-dimensional rotations. Additionally, switch forwarding tables are consulted to construct and display routes between any two nodes, or
among a group of nodes connected in any of several common modes, (all-to-all, all-to-one, random).
So far, static analysis shows that typical communication patterns between even moderate numbers of nodes contain many routes that share links—that is, some of the links between pairs of nodes are used
in numerous routes—so, a single job may overlap with itself.
In the near future, the utility will be used with a live feed of traffic from the active IB fabric, with many running jobs and filesystem activity. It is anticipated that information gained from the
visualizer will benefit all users by enabling dynamic re-routing of communication on Pleiades' IB interconnect to reduce link contention.
For more information about this activity, please contact: Chris Henze, Christopher.E.Henze@nasa.gov
For information about NASA and agency programs on the Web, visit http://www.nasa.gov/home/