Fujitsu/ANU (Area 4) Parallel Mathematics Library Development Project
The Area 4 project commenced in 1992 and was focussed on researching and developing mathematical library algorithms and code for Fujitsu's VPP300/700 supercomputers. The original emphasis of the project was to develop parallel algorithms which achieved high performance on the VPP series however this broadened to include extensions to the existing library of vectorised mathematical subroutines for a single processor of the VPP. The project was completed in 1999.
Parallel and serial vector algoritms have been developed in several areas, including:
- Random number generators
- Eigenvalue solvers
- Sparse Matrix solvers (direct and iterative)
- FFT's
- Mulitigrid preconditioner
- Wavelet Transforms
- Least squares solutions
Code developed was incorporated into the Fujitsu SSL2 scientific subroutine libraries, SSL2VP for one processor and SSL2VPP for multiple processors.
The Supercomputer Facility managed this project and four of the staff were actively involved in the work. Dr M Kahn was responsible for day-to-day management of the project and the coordination of planning and activities across campus. Academic direction and leadership of the project was under Professor R Brent, CSL, RSISE and Professor M Osborne, Program in Advanced Computation, CMA, SMS. Staff and students working on the project were located in SMS, Computer Science Laboratory, RSISE and ANUSF.
Several research fellow positions were funded under this project. Amongst those contributing to the project were Dr David Harrar II, Dr Eric Jiang, Mr David Miron, Dr Markus Hegland, Dr Lutz Grosz, Mr Geoff Keating, Dr Zbigniew Leyk, Dr Zhou Bing Bing, Dr Ole Nielsen and Mr Gavin Mercer.
Some Performance Comparisons
Here are some sample timings to show the performance achieved by code developed as part of the Area4 project.
Parallel dense symmetric eigensolver
| Matrix size | No of processors | Elapsed time | ||
| 4000*4000 | 2 | 211 secs | ||
| 4000*4000 | 4 | 108 secs | ||
| 4000*4000 | 2 | 211 secs | ||
| 4000*4000 | 8 | 58 secs | ||
| 4000*4000 | 2 | 211 secs | ||
| 8000*8000 | 4 | 822 secs | ||
Vector (1 processor) dense symmetric eigensolver
| Matrix size | CPU time | |||
| 3000*3000 | 1.9 mins | |||
| 4000*4000 | 4.2 mins | |||
| 8000*8000 | 29.7 mins | |||
| 8153*8153 | 37 mins | |||
Best existing SSL2 routine for 3000*3000 is 6 mins
LAPACK timings for DSYEV (symmetric real eigensolver)
| 4000*4000 | 5.9 mins | |||
| 8000*8000 | 43.9 mins | |||
Hermitian dense eigensolver (vector code)
| Matrix size(complex) | Area4 CPU time | SSL2 DHEIG2 | ||
| 2000*2000 | 93 secs | 280 secs | ||
| 3000*3000 | 307 secs | 775 secs | ||
| 4000*4000 | 694 secs | 3451 secs | ||
Parallel version
| Matrix size | No of processors | Elapsed time | ||
| 1024*1024 | 4 | 9.3secs | ||
| 4096*4096 | 6 | 199 secs | ||
| 4096*4096 | 8 | 153 secs | ||
FFT's (vector code)
| Size of transform | Real routine DRFTMR | complex routine DCFTMR | ||
| 1048576 | .072 secs | .110 secs | ||
| 1265625 | .146 | .145 | ||
| 2097152 | .149 | .208 | ||
Lanczos algorithm for sparse symmetric eigenproblems (some sample times as the calculation depends on the problem so can't give definite figures)
| Matrix size | No of processors | Time in secs | ||
| 8*10^4 | 2 | 3.25 | ||
| 1.6*10^5 | 6 | 3.51 | ||
| 3.2*10^5 | 8 | 3.62 | ||
Random Normal number generator
Time per random normal generated is 5.4 - 11.4 nanosecs
For further information contact Dr Margaret Kahn
margaret.kahn@anu.edu.au (06) 249 4541
