This work relates to the implementation of a performance portable version of DL_MESO (DPD) using the Kokkos library. It focuses on porting to DL_MESO (DPD) the first and second loops of the Verlet Velocity (VV) scheme for the time marching scheme. This allows to run DL_MESO on NVidia GPUs as well as on other GPUs or architectures (many-core hardware like KNL), allowing performance portability as well as separation of concern between computational science and HPC.
Description
The VV scheme is made of 3 steps:
- a first velocity and particle positions integration by Delta t/2,
- a force calculation, and
- a second velocity integration by Delta t/2.
Steps 1) and 2) are documented is the following two modules
DL_MESO (DPD) on Kokkos: Verlet Velocity step 1
DL_MESO (DPD) on Kokkos: Verlet Velocity step 2
Note: Kokkos is a C++ library, while DL_MESO (DPD) is in Fortran90 Language. The current implementation requires a transfer between Fortran to C++, due to the use of Fortran pointers not bound using the ISO_C_BINDING
standard. This constraint will be removed in future versions of DL_MESO.
Practical application
With the advent of heterogeneous hardware, achieving performance portability across different architectures is one of the main challenges in HPC. In fact, while specific languages, like CUDA, can give best performance for the NVidia hardware, they cannot be used with different GPU vendors limiting the usage across supercomputers worldwide.
In this module we use Kokkos, developed at Sandia National Laboratories, which consist of several C++ templated libraries which provide the capability to offload a workload to several different architectures, taking care of the memory layout and transfer between host and device.
Documentation and source code
The modules documentation is available on our software repository here. The modules have also been pushed into DL_MESO git repository as explained in the modules documentation.