12.11.08 - Ceremony Held for Official Pleiades Launch
A ribbon-cutting ceremony to officially recognize the installation and operation of the Pleiades supercomputer was held at the NAS facility at Ames. The event included guest speakers from industry partners
and NASA management, tours of the NAS computer floor room and the hyperwall-2 visualization system, and a Q&A session. In addition, Pleiades team members gave interviews to local news media
attendees.
11.17.08 - Pleiades Ranked Third-fastest Supercomputer on TOP500 List
Pleiades attained the number three spot on the Top500 list of the world's most powerful computers, announced at the International Conference for High-Performance Computing, Networking, Storage, and
Analysis (SC08) in Austin. The system's 12,800 Intel® Xeon® quad-core processors (51,200 cores, 100 racks) clocked in at 487 trillion floating point operations per second (teraflops) on the LINPACK
benchmark.
In addition, Pleiades came in at #22 (233 megaflops per watt) on the list of most energy-efficient supercomputers in the world on the November 2008 Green500 list. Importantly, when combining energy
efficiency and computational power, Pleiades comes in at #2 among the world's general-purpose supercomputers.
11.10.08 - Fine-tuning and Problem Resolution Continues
NAS successfully resolved a number of system issues, including out-of-memory conditions and reboots, resetting of switches, and Saturn (home filesystem) crashes. The systems team created a new Lustre
filesystem (219 terabytes), primarily for an ECCO (Estimating the Circulation & Climate of the Ocean) code run for the Science Mission Directorate. 14,000 nodes were made available to users representing
all NASA mission directorates.
NAS control room and applications team members helped support Pleiades and users by creating queues, and fine-tuning the Portable Batch System.
10.13.08 - Pleiades has Successful 92-Rack LINPACK Run
NAS successfully completed Pleiades' 92-rack run on the LINPACK benchmark. To accomplish the run, NAS and SGI engineers first created the 64-rack configuration by connecting two 32-rack systems via an
overhead trellis containing 512 InfiniBand (IB) cables. NAS staff also installed IB cables and a new operating system, set up home filesystems, and conducted initial systems testing. The benchmark
performance yielded 462 teraflops with 81.7% efficiency.
A significant number of new copper and fiber cables were installed to support a 100-rack LINPACK run (see 10.06.08, below). The 100-rack test was attempted, but a blown circuit breaker caused Pleiades to
lose power; the breaker was quickly replaced.
10.06.08 - NAS Readies for 100-Rack SGI ICE Benchmark
NAS' systems, security and network teams worked overtime to reconfiguring Pleiades 3 and Pleiades 4 inside the High End Computing Capability (HECC) enclave for internal staff testing. A trellis tray was
designed as a possible solution for routing the 200 fibre cables required to interconnect Pleiades into a new 92-rack configuration.
NAS also geared up for a LINPACK benchmark run next week, and researched a possible 100-rack SGI ICE run by connecting 8 RTJones racks with Pleiades—pending arrival of more InfiniBand cables.
09.29.08 - Pleiades Now in 64-Rack Configuration
Over the weekend, NAS created a 64-rack, 32,000-core Pleiades configuration, connecting P2 and P3 (32 racks each) via an overhead trellis system containing 512 InfiniBand (IB) cables. SGI engineers ran
extensive IB diagnostics, and NAS staff completed additional tests. The necessary IB cables were ordered for the final 84-rack configuration.
The NAS Application Performance and Productivity team continued assisting users with testing various codes and libraries on Pleiades.
09.22.08 - Progress Continues on Pleiades Build, Test
With the first 8 racks of Pleiades (P1) stable and in "production" mode, the NAS team focused on making some adjustments to P2, the 32-rack system, and on additional user testing. The Systems team added
extra memory to P2, bringing one rack to 2 gigabytes per core. They also made progress on resolving an out-of-memory problem that surfaced. In addition, all 16 compute racks for P3 were installed, along
with system software, and testing began. Systems staff discovered a defective compute blade and a defective dual in-line memory module (DIMM), which were quickly replaced.
The Application Performance and Productivity (APP) group continued testing, and are spending the necessary time with advanced users to explain the subtleties of certain job runs on Pleiades. A beta test
user ran a large grid of the V-22 tiltrotor configuration (150 million total grid points), which helped establish far-field boundary condition behavior in hover, and provided a good stress test for the
new machine.
09.02.08 - Original 40 Pleiades Racks on Floor
The original 40-rack (20,480 core) Pleiades system was installed on the NAS computer floor, in an 8-rack and 32-rack configuration. System diagnostics on 32 racks were completed, along with security
clearances. NAS continued working on diagnostics and LINPACK testing; on 15,256 cores, the LINPACK number came in at 158 teraflops—81.2% efficiency.
Performance/scaling comparisons of applications such as Overflow, CART3D, ECCO, and USM3D on Columbia, RTJones, and Pleiades also continued. One "ultra-large" case (96 millions grid points) showed that
Pleiades is 40-45% faster than RTJones.
Preparations for the additional 44 Pleiades racks purchased at the end of July progressed, including equipment moves, installation of 2 new RAIDs, more power distribution units, and associated whips
(electrical cords that run from the power distribution units on the floor to the computer racks). A shipment of 16 racks arrived on August 29th, with the next 16 scheduled for delivery on September 12th.
In the current configuration, Pleiades will comprise 84 racks total.
08.18.08 - First 32 Pleiades Racks Tested
Engineers began an extensive (3-5 week) hardware testing phase on the 32 SGI ICE 8200EX racks installed last week. These new racks were initially tested independently of the 8 racks installed in June,
which allowed the original racks to continue serving users and Applications group staff while the 32 new racks are carefully analyzed and tuned for performance.
Diagnostics and benchmarks will ensure that performance scales on this large system—Pleiades is the largest SGI ICE installation to date. Tests so far resulted in minimal hardware failures. NAS staff also
ran experimental workload applications using 4,096 cores (8 racks) to check for system stability.
A new file system that doubles the bandwidth and space for Pleiades was created and tested, and NAS staff worked with SGI to integrate the 300-gigabyte Lustre metadata server and test InfiniBand (IB)
modifications. Nearly 1,800 IB cables were connected to Pleiades.
08.01.08 - Pleiades Gets 32 Additional Systems
The NAS facility buzzed with activity this week, with the arrival of 32 additional SGI ICE 8200EX racks, the installation of 6 new (and two relocated) power distribution units, installation of a pump
package for Pleiades water cooling system, relocation of 16 Columbia racks to Ames building 233, and general computer floor clean-up. The 48 disk drives on the Lustre metadata server for Pleiades were
increased from 73 gigabytes (GB) to 300 GB. All system deliveries, and electrical and plumbing work for the initial system order neared completion.
A second purchase order associated with the NAS Technology Refresh was placed on July 28th for 44 additional SGI ICE racks—doubling the size of Pleiades to over 43,000 cores.
07.25.08 - Pleiades Testing Nears Completion, Facilities Work Continues
Pleiades and InfiniBand router testing continued during the week on the 8 compute cabinets (4096 cores) installed. The NAS Applications group continued to characterize performance for various applications.
One user was invited to begin testing and additional users will be given access next week. Ongoing porting and scaling work showed that ModelE runs 30% faster on Pleiades than on RTJones.
The NAS facility power panels were upgraded to support the additional power requirements of Pleiades, with 1200-amp panels replacing old 800-amp panels. Power outages associated with the facility upgrade
project caused some processing issues that were handled quickly.
07.17.08 - Installation Effort Quickly Overcomes Small Setbacks
The facility upgrade project continued over the past two weeks, highlighted by testing and "energizing" of the new power complex. Additional power distribution units for Pleiades were delivered, and
installation and power panel activation begun. Due to the sheer volume and complexity of changes, some unexpected outages occurred. NAS staff acted quickly to ameliorate the effects of the outages.
Application testing on Pleiades continues and most applications showed a 20-35% performance improvement over RTJones. A set of NAS Parallel Benchmark performance tests were conducted on Pleiades. The
system demonstrated an ~30% performance improvement per processor over RTJones over a wide range of processor counts.
07.03.08 - Facility Upgrade for Pleiades Completed, Testing Continues
The facility shutdown on June 28th to upgrade the chilled water system to accommodate Pleiades went smoothly, with plumbing and electrical work completed, including installation of a 450-ton chiller.
Electrical breakers and panels arrived mid-week, and preparations were made to pour a concrete pad for the pump package delivery. The computing systems returned to service on schedule, and some associated
hardware issues were handled.
Application testing continued and comparisons were made to theoretical performance numbers. CART3D and FUN3D testing continued, and the latter's results for the sample data set showed that Pleiades is
35-38% faster than RTJones, using 64 cores (8 cores in each of 8 nodes). The NAS Parallel Benchmarks ran successfully on the four newest Pleiades racks, with performance numbers very similar to those
obtained previously on the first four racks. Several workload stress tests were run, and verification and validation for each application were completed.
Testing of the Pleiades filesystem also began, and an approach storage monitoring is in the works.
06.27.08 - Test Work Steps up on Pleiades
All 4,096 cores of the first two Pleiades deliveries have been readied for substantial applications testing, with security checks on the second set of Pleiades systems completed. The legacy parallel code
FUN3D was placed on Pleiades to begin performance testing. FUN3D is integral to aerodynamic design work within NASA and industry. Tests were also run on other workload applications including CART3D.
Limited user beta testing should begin in a couple of weeks. More NAS Parallel Benchmark tests were run, and results from MPI1, Fortran, OpenMP, and MPT tests turned up no issues.
In addition, InfiniBand was integrated into the first set of 2,048 cores, and 1-GigE connections were installed from Pleiades service nodes to the network switch.
06.20.08 - Second Pleiades Systems Installed on Schedule
The second of three Pleiades deliveries arrived on Tuesday, June 17th. The four racks housing 2,048 cores are in place with electrical connections completed. Preliminary testing and evaluation of the
first 2,048 cores indicate a substantial performance improvement between Pleiades and RTJones—between 5-30%. In addition, basic tools for monitoring Pleiades have been made available to the NAS Control
Room staff.
06.06.08 - First Pleiades Systems Installed on Schedule
The first 2,048 cores and 11 compute racks comprising the new 20,480-core SGI Altix ICE system were installed within seven days of delivery on May 23rd. Each rack contains 512 processor cores and 512 GB
of memory. Diagnostics and initial testing were completed in 2 days. Another 2,048 cores will arrive in June, with remaining hardware installed in July. Facility upgrades to power, cooling, and network
systems are on schedule.
In addition, one Columbia node was relocated, and the first of two Pleiades RAID systems were installed.