E-CAM researchers working at the Hartree Centre – Daresbury Laboratory have co-designed the DL_MESO Mesoscale Simulation package to run on multiple GPUs, and ran for the first time a Dissipative Particle Dynamics simulation of a very large system (1.8 billion particles) on 4096 GPUs.
Towards extreme scale dissipative particle dynamics simulations using multiple GPGPUs
J. Castagna, X. Guo, M. Seaton and A. O’Cais
Computer Physics Communications (2020) 107159
DOI: 10.1016/j.cpc.2020.107159 (open access)
A multi-GPGPU development for Mesoscale Simulations using the Dissipative Particle Dynamics method is presented. This distributed GPU acceleration development is an extension of the DL_MESO package to MPI+CUDA in order to exploit the computational power of the latest NVIDIA cards on hybrid CPU–GPU architectures. Details about the extensively applicable algorithm implementation and memory coalescing data structures are presented. The key algorithms’ optimizations for the nearest-neighbour list searching of particle pairs for short range forces, exchange of data and overlapping between computation and communications are also given. We have have carried out strong and weak scaling performance analyses with up to 4096 GPUs. A two phase mixture separation test case with 1.8 billion particles has been run on the Piz Daint supercomputer from the Swiss National Supercomputer Center. With CUDA aware MPI, proper GPU affinity, communication and computation overlap optimizations for multi-GPU version, the final optimization results demonstrated more than 94% efficiency for weak scaling and more than 80% efficiency for strong scaling. As far as we know, this is the first report in the literature of DPD simulations being run on this large number of GPUs. The remaining challenges and future work are also discussed at the end of the paper.