Eigenvalue SoLvers for Petaflop-Applications (ELPA)

: http://elpa.mpcdf.mpg.de

: The ELPA library was originally created by the ELPA consortium, consisting of the following organizations:

Max Planck Computing and Data Facility (MPCDF) formerly known as Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG),
Bergische Universität Wuppertal, Lehrstuhl für angewandte Informatik,
Technische Universität München, Lehrstuhl für Informatik mit Schwerpunkt Wissenschaftliches Rechnen ,
Fritz-Haber-Institut, Berlin, Abt. Theorie,
Max-Plack-Institut für Mathematik in den Naturwissenschaften, Leipzig, Abt. Komplexe Strukutren in Biologie und Kognition, and
IBM Deutschland GmbH

Some parts and enhancements of ELPA have been contributed and authored by the Intel Corporation and Nvidia Corporation, which are not part of the ELPA consortium.

Maintainance and development of the ELPA library is done by the Max Planck Computing and Data Facility (MPCDF)

Futher support of the ELPA library is done by the ELPA-AEO consortium, consisting of the following organizations:

Max Planck Computing and Data Facility (MPCDF) formerly known as Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG),
Bergische Universität Wuppertal, Lehrstuhl für angewandte Informatik,
Technische Universität München, Lehrstuhl für Informatik mit Schwerpunkt Wissenschaftliches Rechnen ,
Technische Universität München, Lehrstuhl für theoretische Chemie,
Fritz-Haber-Institut, Berlin, Abt. Theorie

Contributions to the ELPA source have been authored by (in alphabetical order):

Author: T. Auckenthaler, Volker Blum, A. Heinecke, L. Huedepohl, R. Johanni, Werner Jürgens, Pavel Kus, and A. Marek

All the important information is in the elpa_api::elpa_t derived type

Abstract definition of the elpa_t type

Since ELPA needs (in case of MPI builds) that the matix is block-cyclic distributed the user has to ensure this distribution before calling ELPA. Experience shows, that it is very important that the user checks the return code of 'descinit' to check whether the block-cyclic distribution is valid. Note that ELPA relies on a valid block-cyclic distribution and might show unexpected behavior if this has not been ensured before calling ELPA.

A typical usage of ELPA might look like this:

Fortran synopsis

use elpa
class(elpa_t), pointer :: elpaInstance
integer :: success
 
! We urge the user to always check the error code of all ELPA functions
 
if (elpa_init(20211125) /= elpa_ok) then
   print *, "ELPA API version not supported"
   stop
 endif
 elpa => elpa_allocate(success)
 if (success /= elpa_ok) then
   print *,"Could not allocate ELPA"
 endif
 
 ! set parameters decribing the matrix and it's MPI distribution
 call elpaistance%set("na", na, success, success)
 if (success /= elpa_ok) then
   print *,"Could not set entry"
 endif
 call elpainstance%set("nev", nev, success, success)
 ! check success code ...
 
 call elpainstance%set("local_nrows", na_rows, success)
 ! check success code ...
 
 call elpainstance%set("local_ncols", na_cols, success)
 call elpainstance%set("nblk", nblk, success)
 call elpainstance%set("mpi_comm_parent", mpi_comm_world, success)
 call elpainstance%set("process_row", my_prow, success)
 call elpainstance%set("process_col", my_pcol, success)
 
 ! set up the elpa object
 success = elpainstance%setup()
 if (succes /= elpa_ok) then
   print *,"Could not setup ELPA object"
 endif
 
 ! settings for GPU
 call elpainstance%set("gpu", 1, success) ! 1=on, 2=off
 ! in case of GPU usage you have the choice whether ELPA
 ! should automatically assign each MPI task to a certain GPU
 ! (this is default) or whether you want to set this assignment
 ! for _each_ task yourself
 ! set assignment your self (only using one task here and assigning it 
 ! to GPU id 1)
 if (my_rank .eq. 0) call elpainstance%set("use_gpu_id", 1, success)
 
 ! if desired, set tunable run-time options
 ! here we want to use the 2-stage solver
 call elpainstance%set("solver", elpa_solver_2stage, success)
 
 ! and set a specific kernel (must be supported on the machine)
 ! the CALLING order is important: you have FIRST to set the solver to ELPA_SOLVER_2STAGE
 ! and THEN you can choose a kernel other than the DEFAULT kernel
 call elpainstance%set("real_kernel", elpa_2stage_real_avx_block2)

... set and get all other options that are desired

! if wanted you can store the settings and load them in another program
call elpa%store_settings("save_to_disk.txt", success)
 
! use method solve to solve the eigenvalue problem to obtain eigenvalues
! and eigenvectors
! other possible methods are desribed in \ref elpa_api::elpa_t derived type
call elpainstance%eigenvectors(a, ev, z, success)
 
! cleanup
call elpa_deallocate(e, success)
 
call elpa_uninit()

C synopsis

#include <elpa/elpa.h>
 
elpa_t handle;
int error;
 
/*  We urge the user to always check the error code of all ELPA functions */
 
if (elpa_init(20211125) != ELPA_OK) {
  fprintf(stderr, "Error: ELPA API version not supported");
  exit(1);
}
 
 
handle = elpa_allocate(&error);
if (error != ELPA_OK) {
/* do sth. */
}
 
/* Set parameters the matrix and it's MPI distribution */
elpa_set(handle, "na", na, &error);
elpa_set(handle, "nev", nev, &error);
elpa_set(handle, "local_nrows", na_rows, &error);
elpa_set(handle, "local_ncols", na_cols, &error);
elpa_set(handle, "nblk", nblk, &error);
elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);
elpa_set(handle, "process_row", my_prow, &error);
elpa_set(handle, "process_col", my_pcol, &error);
 
/* Setup */
error = elpa_setup(handle);
 
/* if desired, set tunable run-time options */
/* here we want to use the 2-stage solver */
elpa_set(handle, "solver", ELPA_SOLVER_2STAGE, &error);
 
/* settings for GPU */
elpa_set(handle, "gpu", 1, &error);  /* 1=on, 2=off */
/* in case of GPU usage you have the choice whether ELPA
   should automatically assign each MPI task to a certain GPU
   (this is default) or whether you want to set this assignment
   for _each_ task yourself
   set assignment your self (only using one task here and assigning it 
   to GPU id 1) */
if (my_rank == 0) elpa_set(handle, "use_gpu_id", 1, &error);
 
/* and set a specific kernel (must be supported on the machine)
   the CALLING order is important: you have FIRST to set the solver to ELPA_SOLVER_2STAGE
  and THEN you can choose a kernel other than the DEFAULT kernel */
elpa_set(handle,"real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, &error);

... set and get all other options that are desired

/* if you want you can store the settings and load them in another program */
elpa_store_settings(handle, "save_to_disk.txt");
 
/* use method solve to solve the eigenvalue problem */
/* other possible methods are desribed in \ref elpa_api::elpa_t derived type */
elpa_eigenvectors(handle, a, ev, z, &error);
 
/* cleanup */
elpa_deallocate(handle, &error);
elpa_uninit();

the autotuning could be used like this:

Fortran synopsis

use elpa
class(elpa_t), pointer :: elpa
class(elpa_autotune_t), pointer :: tune_state
integer :: success
 
if (elpa_init(20211125) /= elpa_ok) then
   print *, "ELPA API version not supported"
   stop
 endif
 elpa => elpa_allocate(success)
 
 ! set parameters decribing the matrix and it's MPI distribution
 call elpa%set("na", na, success)
 call elpa%set("nev", nev, success)
 call elpa%set("local_nrows", na_rows, success)
 call elpa%set("local_ncols", na_cols, success)
 call elpa%set("nblk", nblk, success)
 call elpa%set("mpi_comm_parent", mpi_comm_world, success)
 call elpa%set("process_row", my_prow, success)
 call elpa%set("process_col", my_pcol, success)
 
 ! set up the elpa object
 success = elpa%setup()
 
 ! create autotune object
 tune_state => elpa%autotune_setup(elpa_autotune_fast, elpa_autotune_domain_real, success)
 
 ! you can set some options, these will be then FIXED for the autotuning step
 ! if desired, set tunable run-time options
 ! here we want to use the 2-stage solver
 call e%set("solver", elpa_solver_2stage, success)
 
 ! and set a specific kernel (must be supported on the machine)
 ! the CALLING order is important: you have FIRST to set the solver to ELPA_SOLVER_2STAGE
 ! and THEN you can choose a kernel other than the DEFAULT kernel
 call e%set("real_kernel", elpa_2stage_real_avx_block2, success)

... set and get all other options that are desired

iter = 0
do while (elpa%autotune_step(tune_state, success))
  iter = iter + 1
  call e%eigenvectors(a, ev, z, success)
 
  ! if needed you can save the autotune state at any point
  ! and resume it
  if (iter > max_iter) then
    call elpa%autotune_save_state(tune_state,"autotune_checkpoint.txt", success)
    exit
  endif
enddo
 
!set and print the finished autotuning
call elpa%autotune_set_best(tune_state, success)
 
! store _TUNED_ ELPA object, if needed
call elpa%store("autotuned_object.txt", success)
 
!deallocate autotune object
call elpa_autotune_deallocate(tune_state, success)
 
! cleanup
call elpa_deallocate(e, success)
 
call elpa_uninit()

More examples can be found in the folder "test", where Fortran and C example programs are stored