Follow this link to skip to the main content
NASA - National Aeronautics and Space Administration

+ NASA Home
+ Ames Home

+ Sitemap
+ Staff Directory


+Home


HIGH END SYSTEMS
+ Pleiades
+ Columbia
+ Schirra

PRESS RELEASES
TECHNICAL REPORTS
IMAGES
NEWS ARCHIVE
PRESS RELEASES






NAS TECHNICAL HIGHLIGHTS

NAS Optimizations Attain 2x Speedup for US3D Hypersonics Modeling Code

05.27.09
To date, the NASA Advanced Supercomputing (NAS) Division's effort to optimize the US3D computational fluid dynamics (CFD) code has resulted in a 2.6x speedup in the solve routine, and a 2x speedup for the code overall, when executing a standard real-world test case of over 30 million grid elements on the Pleiades supercomputer.

US3D is a next-generation CFD code that utilizes an unstructured grid-based approach. The code is under development by a team at the University of Minnesota, led by I. Nompelis, and supports the Aerodynamics, Aerothermodynamics and Plasmadynamics (AAP) discipline of the NASA Hypersonics Project. US3D offers significant advancement in the ability to accurately model complex geometries at high fidelity with appreciably improved rates of convergence.

NAS' optimization effort has been underway for just a few months. Continuing work to reorder the significant indirect addressing in the code should yield at least another factor of two in overall performance improvement within a similar timeframe, providing a 4x win overall.

The US3D code supports both tetrahedral and prismatic cells. Because of its piecewise structured grid, the code can make use of line implicit successive over-relaxation (SOR) techniques to significantly improve solution convergence rates. The matrix solve and viscous flux calculations are the major time-consuming components in this code. As with other codes, the matrix solve dominates runtime.

Early NAS optimization work identified a number of operations in the solve that could be converted to Real*4. Using 32-bit floating-point data and operations (rather than the default 64-bit) allowed for more effective use of cache, memory bandwidth, inter-nodal communication, and memory storage. The off-diagonal terms of the matrix computation were converted first, as were a number of temporary storage arrays. Other optimizations included loop re-ordering, loop fusion, and additional loop unrolling.

Further code restructuring was also done to allow the compiler to generate code that takes advantage of floating point acceleration found in the "pipelining" features of the Xeon processor (used in Pleiades).

Plot of performance results of the US3D code

Plot of performance results of the US3D code for the sample problem, when run on Pleiades. The original code and the latest optimized code are compared for various core counts, as well as when executed on different numbers of active cores within a Xeon processor (ppn = processes per node). Scaling and the relative speedup for the new code are preserved across all comparable runs.


For more information about this activity, please contact: Jim Taft, James.R.Taft@nasa.gov

For information about NASA and agency programs on the Web, visit http://www.nasa.gov/home/




ARCHIVE

Visit Ames News Archive.
+ View more news

Visit NASA News Archive.
+ View more news




USA.gov -- government made easy
+ Feedback
+ Site Help
+ NASA Privacy Statement, Disclaimer, and Accessibility Certification
Click to visit the NAS Homepage
Editor: Jill Dunbar
Webmaster: John Hardman
NASA Official: Rupak Biswas
+ Contact NAS

Last Updated: August 20, 2009