This work relates to the implementation of a performance portable version of DL_MESO (DPD) using the Kokkos library. It focuses on porting to DL_MESO (DPD) the first and second loops of the Verlet Velocity (VV) scheme for the time marching scheme. This allows to run DL_MESO on NVidia GPUs as well as on other GPUs or architectures (many-core hardware like KNL), allowing performance portability as well as separation of concern between computational science and HPC.
The VV scheme is made of 3 steps:
- a first velocity and particle positions integration by Delta t/2,
- a force calculation, and
- a second velocity integration by Delta t/2.
Steps 1) and 2) are documented is the following two modules
Note: Kokkos is a C++ library, while DL_MESO (DPD) is in Fortran90 Language. The current implementation requires a transfer between Fortran to C++, due to the use of Fortran pointers not bound using the
ISO_C_BINDING standard. This constraint will be removed in future versions of DL_MESO.
With the advent of heterogeneous hardware, achieving performance portability across different architectures is one of the main challenges in HPC. In fact, while specific languages, like CUDA, can give best performance for the NVidia hardware, they cannot be used with different GPU vendors limiting the usage across supercomputers worldwide.
In this module we use Kokkos, developed at Sandia National Laboratories, which consist of several C++ templated libraries which provide the capability to offload a workload to several different architectures, taking care of the memory layout and transfer between host and device.
Documentation and source code
The modules documentation is available on our software repository here. The modules have also been pushed into DL_MESO git repository as explained in the modules documentation.