Follow this link to skip to the main content
NASA - National Aeronautics and Space Administration

+ NASA Home
+ Ames Home

+ Sitemap
+ Staff Directory


+Home


HIGH END SYSTEMS
+ Pleiades
+ Columbia
+ Schirra

ORGANIZATION
PROJECTS
PARTNERS





Pleiades-SGI ICE Supercomputer


An at-a-glance account (in reverse chronological order) of the construction of Pleiades, NASA's newest supercomputer, named after the star cluster in the constellation Taurus. The system was built and tested throughout summer 2008, and became available to users for production computing in December 2008.


PLEIADES CONSTRUCTION

12.11.08 - Ceremony Held for Official Pleiades Launch
A ribbon-cutting ceremony to officially recognize the installation and operation of the Pleiades supercomputer was held at the NAS facility at Ames. The event included guest speakers from industry partners and NASA management, tours of the NAS computer floor room and the hyperwall-2 visualization system, and a Q&A session. In addition, Pleiades team members gave interviews to local news media attendees.

11.17.08 - Pleiades Ranked Third-fastest Supercomputer on TOP500 List
Pleiades attained the number three spot on the Top500 list of the world's most powerful computers, announced at the International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC08) in Austin. The system's 12,800 Intel® Xeon® quad-core processors (51,200 cores, 100 racks) clocked in at 487 trillion floating point operations per second (teraflops) on the LINPACK benchmark.

In addition, Pleiades came in at #22 (233 megaflops per watt) on the list of most energy-efficient supercomputers in the world on the November 2008 Green500 list. Importantly, when combining energy efficiency and computational power, Pleiades comes in at #2 among the world's general-purpose supercomputers.

11.10.08 - Fine-tuning and Problem Resolution Continues
NAS successfully resolved a number of system issues, including out-of-memory conditions and reboots, resetting of switches, and Saturn (home filesystem) crashes. The systems team created a new Lustre filesystem (219 terabytes), primarily for an ECCO (Estimating the Circulation & Climate of the Ocean) code run for the Science Mission Directorate. 14,000 nodes were made available to users representing all NASA mission directorates.

NAS control room and applications team members helped support Pleiades and users by creating queues, and fine-tuning the Portable Batch System.

10.13.08 - Pleiades has Successful 92-Rack LINPACK Run
NAS successfully completed Pleiades' 92-rack run on the LINPACK benchmark. To accomplish the run, NAS and SGI engineers first created the 64-rack configuration by connecting two 32-rack systems via an overhead trellis containing 512 InfiniBand (IB) cables. NAS staff also installed IB cables and a new operating system, set up home filesystems, and conducted initial systems testing. The benchmark performance yielded 462 teraflops with 81.7% efficiency.

A significant number of new copper and fiber cables were installed to support a 100-rack LINPACK run (see 10.06.08, below). The 100-rack test was attempted, but a blown circuit breaker caused Pleiades to lose power; the breaker was quickly replaced.

10.06.08 - NAS Readies for 100-Rack SGI ICE Benchmark
NAS' systems, security and network teams worked overtime to reconfiguring Pleiades 3 and Pleiades 4 inside the High End Computing Capability (HECC) enclave for internal staff testing. A trellis tray was designed as a possible solution for routing the 200 fibre cables required to interconnect Pleiades into a new 92-rack configuration.

NAS also geared up for a LINPACK benchmark run next week, and researched a possible 100-rack SGI ICE run by connecting 8 RTJones racks with Pleiades—pending arrival of more InfiniBand cables.

09.29.08 - Pleiades Now in 64-Rack Configuration
Over the weekend, NAS created a 64-rack, 32,000-core Pleiades configuration, connecting P2 and P3 (32 racks each) via an overhead trellis system containing 512 InfiniBand (IB) cables. SGI engineers ran extensive IB diagnostics, and NAS staff completed additional tests. The necessary IB cables were ordered for the final 84-rack configuration.

The NAS Application Performance and Productivity team continued assisting users with testing various codes and libraries on Pleiades.

09.22.08 - Progress Continues on Pleiades Build, Test
With the first 8 racks of Pleiades (P1) stable and in "production" mode, the NAS team focused on making some adjustments to P2, the 32-rack system, and on additional user testing. The Systems team added extra memory to P2, bringing one rack to 2 gigabytes per core. They also made progress on resolving an out-of-memory problem that surfaced. In addition, all 16 compute racks for P3 were installed, along with system software, and testing began. Systems staff discovered a defective compute blade and a defective dual in-line memory module (DIMM), which were quickly replaced.

The Application Performance and Productivity (APP) group continued testing, and are spending the necessary time with advanced users to explain the subtleties of certain job runs on Pleiades. A beta test user ran a large grid of the V-22 tiltrotor configuration (150 million total grid points), which helped establish far-field boundary condition behavior in hover, and provided a good stress test for the new machine.

09.02.08 - Original 40 Pleiades Racks on Floor
The original 40-rack (20,480 core) Pleiades system was installed on the NAS computer floor, in an 8-rack and 32-rack configuration. System diagnostics on 32 racks were completed, along with security clearances. NAS continued working on diagnostics and LINPACK testing; on 15,256 cores, the LINPACK number came in at 158 teraflops—81.2% efficiency.

Performance/scaling comparisons of applications such as Overflow, CART3D, ECCO, and USM3D on Columbia, RTJones, and Pleiades also continued. One "ultra-large" case (96 millions grid points) showed that Pleiades is 40-45% faster than RTJones.

Preparations for the additional 44 Pleiades racks purchased at the end of July progressed, including equipment moves, installation of 2 new RAIDs, more power distribution units, and associated whips (electrical cords that run from the power distribution units on the floor to the computer racks). A shipment of 16 racks arrived on August 29th, with the next 16 scheduled for delivery on September 12th. In the current configuration, Pleiades will comprise 84 racks total.

08.18.08 - First 32 Pleiades Racks Tested
Engineers began an extensive (3-5 week) hardware testing phase on the 32 SGI ICE 8200EX racks installed last week. These new racks were initially tested independently of the 8 racks installed in June, which allowed the original racks to continue serving users and Applications group staff while the 32 new racks are carefully analyzed and tuned for performance.

Diagnostics and benchmarks will ensure that performance scales on this large system—Pleiades is the largest SGI ICE installation to date. Tests so far resulted in minimal hardware failures. NAS staff also ran experimental workload applications using 4,096 cores (8 racks) to check for system stability.

A new file system that doubles the bandwidth and space for Pleiades was created and tested, and NAS staff worked with SGI to integrate the 300-gigabyte Lustre metadata server and test InfiniBand (IB) modifications. Nearly 1,800 IB cables were connected to Pleiades.

08.01.08 - Pleiades Gets 32 Additional Systems
The NAS facility buzzed with activity this week, with the arrival of 32 additional SGI ICE 8200EX racks, the installation of 6 new (and two relocated) power distribution units, installation of a pump package for Pleiades water cooling system, relocation of 16 Columbia racks to Ames building 233, and general computer floor clean-up. The 48 disk drives on the Lustre metadata server for Pleiades were increased from 73 gigabytes (GB) to 300 GB. All system deliveries, and electrical and plumbing work for the initial system order neared completion.

A second purchase order associated with the NAS Technology Refresh was placed on July 28th for 44 additional SGI ICE racks—doubling the size of Pleiades to over 43,000 cores.

07.25.08 - Pleiades Testing Nears Completion, Facilities Work Continues
Pleiades and InfiniBand router testing continued during the week on the 8 compute cabinets (4096 cores) installed. The NAS Applications group continued to characterize performance for various applications. One user was invited to begin testing and additional users will be given access next week. Ongoing porting and scaling work showed that ModelE runs 30% faster on Pleiades than on RTJones.

The NAS facility power panels were upgraded to support the additional power requirements of Pleiades, with 1200-amp panels replacing old 800-amp panels. Power outages associated with the facility upgrade project caused some processing issues that were handled quickly.

07.17.08 - Installation Effort Quickly Overcomes Small Setbacks
The facility upgrade project continued over the past two weeks, highlighted by testing and "energizing" of the new power complex. Additional power distribution units for Pleiades were delivered, and installation and power panel activation begun. Due to the sheer volume and complexity of changes, some unexpected outages occurred. NAS staff acted quickly to ameliorate the effects of the outages.

Application testing on Pleiades continues and most applications showed a 20-35% performance improvement over RTJones. A set of NAS Parallel Benchmark performance tests were conducted on Pleiades. The system demonstrated an ~30% performance improvement per processor over RTJones over a wide range of processor counts.

07.03.08 - Facility Upgrade for Pleiades Completed, Testing Continues
The facility shutdown on June 28th to upgrade the chilled water system to accommodate Pleiades went smoothly, with plumbing and electrical work completed, including installation of a 450-ton chiller. Electrical breakers and panels arrived mid-week, and preparations were made to pour a concrete pad for the pump package delivery. The computing systems returned to service on schedule, and some associated hardware issues were handled.

Application testing continued and comparisons were made to theoretical performance numbers. CART3D and FUN3D testing continued, and the latter's results for the sample data set showed that Pleiades is 35-38% faster than RTJones, using 64 cores (8 cores in each of 8 nodes). The NAS Parallel Benchmarks ran successfully on the four newest Pleiades racks, with performance numbers very similar to those obtained previously on the first four racks. Several workload stress tests were run, and verification and validation for each application were completed.

Testing of the Pleiades filesystem also began, and an approach storage monitoring is in the works.

06.27.08 - Test Work Steps up on Pleiades
All 4,096 cores of the first two Pleiades deliveries have been readied for substantial applications testing, with security checks on the second set of Pleiades systems completed. The legacy parallel code FUN3D was placed on Pleiades to begin performance testing. FUN3D is integral to aerodynamic design work within NASA and industry. Tests were also run on other workload applications including CART3D. Limited user beta testing should begin in a couple of weeks. More NAS Parallel Benchmark tests were run, and results from MPI1, Fortran, OpenMP, and MPT tests turned up no issues.

In addition, InfiniBand was integrated into the first set of 2,048 cores, and 1-GigE connections were installed from Pleiades service nodes to the network switch.

06.20.08 - Second Pleiades Systems Installed on Schedule
The second of three Pleiades deliveries arrived on Tuesday, June 17th. The four racks housing 2,048 cores are in place with electrical connections completed. Preliminary testing and evaluation of the first 2,048 cores indicate a substantial performance improvement between Pleiades and RTJones—between 5-30%. In addition, basic tools for monitoring Pleiades have been made available to the NAS Control Room staff.

06.06.08 - First Pleiades Systems Installed on Schedule
The first 2,048 cores and 11 compute racks comprising the new 20,480-core SGI Altix ICE system were installed within seven days of delivery on May 23rd. Each rack contains 512 processor cores and 512 GB of memory. Diagnostics and initial testing were completed in 2 days. Another 2,048 cores will arrive in June, with remaining hardware installed in July. Facility upgrades to power, cooling, and network systems are on schedule.

In addition, one Columbia node was relocated, and the first of two Pleiades RAID systems were installed.


ADDITIONAL INFORMATION

+ NASA gives tour of latest supercomputer - EETimes article and video tour with Pleiades project manager Bob Ciotti

+ Green Supercomputing at NAS




USA.gov -- government made easy
+ Feedback
+ Site Help
+ NASA Privacy Statement, Disclaimer, and Accessibility Certification
Click to visit the NAS Homepage
Editor: Jill Dunbar
Webmaster: John Hardman
NASA Official: Rupak Biswas
+ Contact NAS

Last Updated: August 26, 2009