### Microwulf: Performance

Supercomputer performance is typically measured in flops -- the number of floating point instructions the supercomputer can perform each second. Early supercomputer performance was measured in megaflops (Mflops: 106 flops). Hardware advances increased subsequent supercomputers performance to gigaflops (Gflops: 109 flops). Today's massively parallel supercomputers are measured in teraflops (Tflops: 1012 flops), and tomorrow's systems will be measured in petaflops (Pflops: 1015 flops).

When discussing supercomputer performance, you must also distinguish between

• peak performance -- the theoretical maximum performance a given computer could possibly achieve; and
• measured performance -- the maximum performance a given computer actually achieves on a benchmark or other performance-measurement program.
Computer manufacturers often list a computer's performance using its peak performance, resulting in inflated performance claims. In actual usage, you are doing well if your computer's measured performance is 50-60% of its peak performance.

One final factor in measuring performance is the precision of the floating point operations. Most high performance computations use double-precision operations. These can be much more time-consuming than single-precision operations, so you have to be careful not to mix these comparisons -- if you do, you're comparing apples to oranges.

The standard benchmark (i.e., used by the top500.org supercomputer list) for measuring supercomputer performance is high performance Linpack (aka HPL), a program that exercises and reports a supercomputer's double-precision floating point performance. To install and run HPL, you must first install a version of the Basic Linear Algebra Subprograms (BLAS) libraries, since HPL depends on them.

In March 2007, we benchmarked Microwulf using HPL and Goto BLAS. After compiling and installing each package, we ran the standard, double-precision version of HPL, varying its parameter values as follows: We varied PxQ between {1x8, 2x4}; varied NB between {100, 120, 140, 160, 180, 200}; and used increasing values of N, starting with 1,000. For the following parameter values:

PxQ = 2x4; NB = 160; N = 30,000
HPL reported 26.25 Gflops on its WR00R2R4 operation. Microwulf also exceeded 26 Gflops on other operations, but 26.25 Gflops was our maximum.

This is significant computational power. For example, according to the top500 list, a 1996 Cray T3D-256 provided just 25.3 Gflops of measured performance.

Since we benchmarked Microwulf, Advanced Clustering Technologies has published a convenient web-based calculator that removes much of the trial and error from tuning HPL.

Joel Adams > Research > Microwulf > Performance