Eigenvalue SoLvers for Petaflop-Applications (ELPA) 2021.11.002
Eigenvalue SoLvers for Petaflop-Applications (ELPA)

Eigenvalue SoLvers for Petaflop-Applications (ELPA)

http://elpa.mpcdf.mpg.de
The ELPA library was originally created by the ELPA consortium, consisting of the following organizations:
  • Max Planck Computing and Data Facility (MPCDF) formerly known as Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG),
  • Bergische Universität Wuppertal, Lehrstuhl für angewandte Informatik,
  • Technische Universität München, Lehrstuhl für Informatik mit Schwerpunkt Wissenschaftliches Rechnen ,
  • Fritz-Haber-Institut, Berlin, Abt. Theorie,
  • Max-Plack-Institut für Mathematik in den Naturwissenschaften, Leipzig, Abt. Komplexe Strukutren in Biologie und Kognition, and
  • IBM Deutschland GmbH

Some parts and enhancements of ELPA have been contributed and authored by the Intel Corporation and Nvidia Corporation, which are not part of the ELPA consortium.

Maintainance and development of the ELPA library is done by the Max Planck Computing and Data Facility (MPCDF)

Futher support of the ELPA library is done by the ELPA-AEO consortium, consisting of the following organizations:

  • Max Planck Computing and Data Facility (MPCDF) formerly known as Rechenzentrum Garching der Max-Planck-Gesellschaft (RZG),
  • Bergische Universität Wuppertal, Lehrstuhl für angewandte Informatik,
  • Technische Universität München, Lehrstuhl für Informatik mit Schwerpunkt Wissenschaftliches Rechnen ,
  • Technische Universität München, Lehrstuhl für theoretische Chemie,
  • Fritz-Haber-Institut, Berlin, Abt. Theorie

Contributions to the ELPA source have been authored by (in alphabetical order):

Author
T. Auckenthaler, Volker Blum, A. Heinecke, L. Huedepohl, R. Johanni, Werner Jürgens, Pavel Kus, and A. Marek

All the important information is in the elpa_api::elpa_t derived type

Abstract definition of the elpa_t type

Since ELPA needs (in case of MPI builds) that the matix is block-cyclic distributed the user has to ensure this distribution before calling ELPA. Experience shows, that it is very important that the user checks the return code of 'descinit' to check whether the block-cyclic distribution is valid. Note that ELPA relies on a valid block-cyclic distribution and might show unexpected behavior if this has not been ensured before calling ELPA.

A typical usage of ELPA might look like this:

Fortran synopsis

use elpa
class(elpa_t), pointer :: elpaInstance
integer :: success
! We urge the user to always check the error code of all ELPA functions
if (elpa_init(20211125) /= elpa_ok) then
print *, "ELPA API version not supported"
stop
endif
elpa => elpa_allocate(success)
if (success /= elpa_ok) then
print *,"Could not allocate ELPA"
endif
! set parameters decribing the matrix and it's MPI distribution
call elpaistance%set("na", na, success, success)
if (success /= elpa_ok) then
print *,"Could not set entry"
endif
call elpainstance%set("nev", nev, success, success)
! check success code ...
call elpainstance%set("local_nrows", na_rows, success)
! check success code ...
call elpainstance%set("local_ncols", na_cols, success)
call elpainstance%set("nblk", nblk, success)
call elpainstance%set("mpi_comm_parent", mpi_comm_world, success)
call elpainstance%set("process_row", my_prow, success)
call elpainstance%set("process_col", my_pcol, success)
! set up the elpa object
success = elpainstance%setup()
if (succes /= elpa_ok) then
print *,"Could not setup ELPA object"
endif
! settings for GPU
call elpainstance%set("gpu", 1, success) ! 1=on, 2=off
! in case of GPU usage you have the choice whether ELPA
! should automatically assign each MPI task to a certain GPU
! (this is default) or whether you want to set this assignment
! for _each_ task yourself
! set assignment your self (only using one task here and assigning it
! to GPU id 1)
if (my_rank .eq. 0) call elpainstance%set("use_gpu_id", 1, success)
! if desired, set tunable run-time options
! here we want to use the 2-stage solver
call elpainstance%set("solver", elpa_solver_2stage, success)
! and set a specific kernel (must be supported on the machine)
! the CALLING order is important: you have FIRST to set the solver to ELPA_SOLVER_2STAGE
! and THEN you can choose a kernel other than the DEFAULT kernel
call elpainstance%set("real_kernel", elpa_2stage_real_avx_block2)
int elpa_init(int api_version)
elpa_t elpa_allocate(int *error)
Fortran module to use the ELPA library. No other module shoule be used.
Definition: elpa.F90:337

... set and get all other options that are desired

! if wanted you can store the settings and load them in another program
call elpa%store_settings("save_to_disk.txt", success)
! use method solve to solve the eigenvalue problem to obtain eigenvalues
! and eigenvectors
! other possible methods are desribed in \ref elpa_api::elpa_t derived type
call elpainstance%eigenvectors(a, ev, z, success)
! cleanup
call elpa_deallocate(e, success)
void elpa_uninit(int *error)
void elpa_deallocate(elpa_t handle, int *error)

C synopsis

#include <elpa/elpa.h>
elpa_t handle;
int error;
/* We urge the user to always check the error code of all ELPA functions */
if (elpa_init(20211125) != ELPA_OK) {
fprintf(stderr, "Error: ELPA API version not supported");
exit(1);
}
handle = elpa_allocate(&error);
if (error != ELPA_OK) {
/* do sth. */
}
/* Set parameters the matrix and it's MPI distribution */
elpa_set(handle, "na", na, &error);
elpa_set(handle, "nev", nev, &error);
elpa_set(handle, "local_nrows", na_rows, &error);
elpa_set(handle, "local_ncols", na_cols, &error);
elpa_set(handle, "nblk", nblk, &error);
elpa_set(handle, "mpi_comm_parent", MPI_Comm_c2f(MPI_COMM_WORLD), &error);
elpa_set(handle, "process_row", my_prow, &error);
elpa_set(handle, "process_col", my_pcol, &error);
/* Setup */
error = elpa_setup(handle);
/* if desired, set tunable run-time options */
/* here we want to use the 2-stage solver */
elpa_set(handle, "solver", ELPA_SOLVER_2STAGE, &error);
/* settings for GPU */
elpa_set(handle, "gpu", 1, &error); /* 1=on, 2=off */
/* in case of GPU usage you have the choice whether ELPA
should automatically assign each MPI task to a certain GPU
(this is default) or whether you want to set this assignment
for _each_ task yourself
set assignment your self (only using one task here and assigning it
to GPU id 1) */
if (my_rank == 0) elpa_set(handle, "use_gpu_id", 1, &error);
/* and set a specific kernel (must be supported on the machine)
the CALLING order is important: you have FIRST to set the solver to ELPA_SOLVER_2STAGE
and THEN you can choose a kernel other than the DEFAULT kernel */
elpa_set(handle,"real_kernel", ELPA_2STAGE_REAL_AVX_BLOCK2, &error);
struct elpa_struct * elpa_t
Definition: elpa.h:10
@ ELPA_SOLVER_2STAGE
Definition: elpa_constants.h:28
@ ELPA_2STAGE_REAL_AVX_BLOCK2
Definition: elpa_constants.h:82
@ ELPA_OK
Definition: elpa_constants.h:139
int elpa_setup(elpa_t handle)
C interface for the implementation of the elpa_autotune_deallocate method.
#define elpa_set(e, name, value, error)
generic C method for elpa_set
Definition: elpa_generic.h:12

... set and get all other options that are desired

/* if you want you can store the settings and load them in another program */
elpa_store_settings(handle, "save_to_disk.txt");
/* use method solve to solve the eigenvalue problem */
/* other possible methods are desribed in \ref elpa_api::elpa_t derived type */
elpa_eigenvectors(handle, a, ev, z, &error);
/* cleanup */
elpa_deallocate(handle, &error);
void elpa_store_settings(elpa_t handle, const char *filename, int *error)
C interface for the implementation of the elpa_store_settings method.
#define elpa_eigenvectors(handle, a, ev, q, error)
generic C method for elpa_eigenvectors
Definition: elpa_generic.h:49

the autotuning could be used like this:

Fortran synopsis

use elpa
class(elpa_t), pointer :: elpa
class(elpa_autotune_t), pointer :: tune_state
integer :: success
if (elpa_init(20211125) /= elpa_ok) then
print *, "ELPA API version not supported"
stop
endif
elpa => elpa_allocate(success)
! set parameters decribing the matrix and it's MPI distribution
call elpa%set("na", na, success)
call elpa%set("nev", nev, success)
call elpa%set("local_nrows", na_rows, success)
call elpa%set("local_ncols", na_cols, success)
call elpa%set("nblk", nblk, success)
call elpa%set("mpi_comm_parent", mpi_comm_world, success)
call elpa%set("process_row", my_prow, success)
call elpa%set("process_col", my_pcol, success)
! set up the elpa object
success = elpa%setup()
! create autotune object
tune_state => elpa%autotune_setup(elpa_autotune_fast, elpa_autotune_domain_real, success)
! you can set some options, these will be then FIXED for the autotuning step
! if desired, set tunable run-time options
! here we want to use the 2-stage solver
call e%set("solver", elpa_solver_2stage, success)
! and set a specific kernel (must be supported on the machine)
! the CALLING order is important: you have FIRST to set the solver to ELPA_SOLVER_2STAGE
! and THEN you can choose a kernel other than the DEFAULT kernel
call e%set("real_kernel", elpa_2stage_real_avx_block2, success)

... set and get all other options that are desired

iter = 0
do while (elpa%autotune_step(tune_state, success))
iter = iter + 1
call e%eigenvectors(a, ev, z, success)
! if needed you can save the autotune state at any point
! and resume it
if (iter > max_iter) then
call elpa%autotune_save_state(tune_state,"autotune_checkpoint.txt", success)
exit
endif
enddo
!set and print the finished autotuning
call elpa%autotune_set_best(tune_state, success)
! store _TUNED_ ELPA object, if needed
call elpa%store("autotuned_object.txt", success)
!deallocate autotune object
call elpa_autotune_deallocate(tune_state, success)
! cleanup
call elpa_deallocate(e, success)
void elpa_autotune_deallocate(elpa_autotune_t handle, int *error)

More examples can be found in the folder "test", where Fortran and C example programs are stored