05.27.09
To date, the NASA Advanced Supercomputing (NAS) Division's effort to optimize the US3D computational fluid dynamics (CFD) code has resulted in a 2.6x speedup in the solve routine, and a 2x speedup for the code overall, when executing a standard real-world test case of over 30 million grid elements on the Pleiades supercomputer.
US3D is a next-generation CFD code that utilizes an unstructured grid-based approach. The code is under development by a team at the University of Minnesota, led by I. Nompelis, and supports the Aerodynamics, Aerothermodynamics and Plasmadynamics (AAP) discipline of the NASA Hypersonics Project. US3D offers significant advancement in the ability to accurately model complex geometries at high fidelity with appreciably improved rates of convergence.
NAS' optimization effort has been underway for just a few months. Continuing work to reorder the significant indirect addressing in the code should yield at least another factor of two in overall performance improvement within a similar timeframe, providing a 4x win overall.
The US3D code supports both tetrahedral and prismatic cells. Because of its piecewise structured grid, the code can make use of line implicit successive over-relaxation (SOR) techniques to significantly improve solution convergence rates. The matrix solve and viscous flux calculations are the major time-consuming components in this code. As with other codes, the matrix solve dominates runtime.
Early NAS optimization work identified a number of operations in the solve that could be converted to Real*4. Using 32-bit floating-point data and operations (rather than the default 64-bit) allowed for more effective use of cache, memory bandwidth, inter-nodal communication, and memory storage. The off-diagonal terms of the matrix computation were converted first, as were a number of temporary storage arrays. Other optimizations included loop re-ordering, loop fusion, and additional loop unrolling.
Further code restructuring was also done to allow the compiler to generate code that takes advantage of floating point acceleration found in the "pipelining" features of the Xeon processor (used in Pleiades).
For more information about this activity, please contact:
Jim Taft
james.r.taft@nasa.gov
For information about NASA and agency programs on the Web, visit:
http://www.nasa.gov/home