0
Skip Navigation | ANU Home | Search ANU | Directories | NCI National Facility
The Australian National University
ANU Supercomputer Facility
Printer Friendly Version of this Document

Fujitsu/ANU (Area 4) Parallel Mathematics Library Development Project

The Area 4 project commenced in 1992 and was focussed on researching and developing mathematical library algorithms and code for Fujitsu's VPP300/700 supercomputers. The original emphasis of the project was to develop parallel algorithms which achieved high performance on the VPP series however this broadened to include extensions to the existing library of vectorised mathematical subroutines for a single processor of the VPP. The project was completed in 1999.

Parallel and serial vector algoritms have been developed in several areas, including:

  • Random number generators
  • Eigenvalue solvers
  • Sparse Matrix solvers (direct and iterative)
  • FFT's
  • Mulitigrid preconditioner
  • Wavelet Transforms
  • Least squares solutions

Code developed was incorporated into the Fujitsu SSL2 scientific subroutine libraries, SSL2VP for one processor and SSL2VPP for multiple processors.

The Supercomputer Facility managed this project and four of the staff were actively involved in the work. Dr M Kahn was responsible for day-to-day management of the project and the coordination of planning and activities across campus. Academic direction and leadership of the project was under Professor R Brent, CSL, RSISE and Professor M Osborne, Program in Advanced Computation, CMA, SMS. Staff and students working on the project were located in SMS, Computer Science Laboratory, RSISE and ANUSF.

Several research fellow positions were funded under this project. Amongst those contributing to the project were Dr David Harrar II, Dr Eric Jiang, Mr David Miron, Dr Markus Hegland, Dr Lutz Grosz, Mr Geoff Keating, Dr Zbigniew Leyk, Dr Zhou Bing Bing, Dr Ole Nielsen and Mr Gavin Mercer.

Some Performance Comparisons

Here are some sample timings to show the performance achieved by code developed as part of the Area4 project.

Parallel dense symmetric eigensolver


Matrix sizeNo of processorsElapsed time

4000*4000 2 211 secs
4000*4000 4 108 secs
4000*4000 2 211 secs
4000*4000 8 58 secs
4000*4000 2 211 secs
8000*8000 4 822 secs

Vector (1 processor) dense symmetric eigensolver


Matrix size CPU time

3000*3000 1.9 mins
4000*4000 4.2 mins
8000*8000 29.7 mins
8153*8153 37 mins

Best existing SSL2 routine for 3000*3000 is 6 mins

LAPACK timings for DSYEV (symmetric real eigensolver)


4000*4000 5.9 mins
8000*8000 43.9 mins

Hermitian dense eigensolver (vector code)


Matrix size(complex) Area4 CPU time SSL2 DHEIG2

2000*2000 93 secs 280 secs
3000*3000 307 secs 775 secs
4000*4000 694 secs 3451 secs

Parallel version


Matrix sizeNo of processorsElapsed time
1024*1024 4 9.3secs
4096*4096 6 199 secs
4096*4096 8 153 secs

FFT's (vector code)


Size of transform Real routine DRFTMR complex routine DCFTMR

1048576 .072 secs .110 secs
1265625 .146 .145
2097152 .149 .208

Lanczos algorithm for sparse symmetric eigenproblems (some sample times as the calculation depends on the problem so can't give definite figures)


Matrix size No of processors Time in secs

8*10^4 2 3.25
1.6*10^5 6 3.51
3.2*10^5 8 3.62

Random Normal number generator

Time per random normal generated is 5.4 - 11.4 nanosecs


For further information contact Dr Margaret Kahn

margaret.kahn@anu.edu.au (06) 249 4541