QMCPack Interfaces for Electronic Structure Computations

Quantum Monte Carlo (QMC) methods are a class of ab initio, stochastic techniques for the study of quantum systems. While QMC simulations are computationally expensive, they have the advantage of being accurate, fully ab initio and scalable to a large number of cores with limited memory requirements.

These features make QMC methods a valuable tool to assess the accuracy of DFT computations, which are widely used in the fields of condensed matter physics, quantum chemistry and material science.

QMCPack is a free package for QMC simulations of electronic structure developed in several national labs in the US. This package is written in object oriented C++, offers a great flexibility in the choice of systems, trial wave functions and QMC methods and supports massive parallelism and the usage of GPUs.

Trial wave functions for electronic QMC computations commonly require the use of  single electrons orbitals, typically computed by DFT. The aim of the E-CAM pilot project described here is to build interfaces between QMCPack and other softwares for electronic structure computations, e.g. the DFT code Quantum Espresso.

These interfaces are used to manage the orbital reading or their DFT generation within QMCPack, to establish an automated, black box workflow for QMC computations. QMC simulation can for example be used in the benchmark and validation of DFT calculations: such a procedure can be employed in the study of several physical systems of interest in condensed matter physics, chemistry or material science, with application in the industry, e.g. in the study of metal-ion or water-carbon interfaces.

The following modules have been built as part of this pilot project:

  • QMCQEPack, that provides the files to download and  properly patch Quantum Espresso 5.3 to build the libpwinterface.so library; this library is required to use the module ESPWSCFInterface to generate single particle orbitals during a QMCPack computation using Quantum Espresso.
  • ESInterfaceBase that provides a base class for a general interface to generate single particle orbitals to be used in QMC simulations in QMCPack; implementations of specific interfaces as derived classes of ESInterfaceBase are available as the separate modules as follows:

The documentation about interfaces in QMCPack, can be seen in the QMCPack user manual at https://github.com/michruggeri/qmcpack/blob/f88a419ad1a24c68b2fdc345ad141e05ed0ab178/manual/interfaces.tex

Share

PANNA: Properties from Artificial Neural Network Architectures

PANNA is a package for training and validating neural networks to represent atomic potentials. It implements configurable all-to-all connected deep neural network architectures which allow for the exploration of training dynamics. Currently it includes tools to enable original[1] and modified[2] Behler-Parrinello input feature vectors, both for molecules and crystals, but the network can also be used in an input-agnostic fashion to enable further experimentation. PANNA is written in Python and relies on TensorFlow as underlying engine.

A common way to use PANNA in its current implementation is to train a neural network in order to estimate the total energy of a molecule or crystal, as a sum of atomic contributions, by learning from the data of reference total energy calculations for similar structures (usually ab-initio calculations).

The neural network models in literature often start from a description of the system of interest in terms of local feature vectors for each atom in the configuration. PANNA provides tools to calculate two versions of the Behler-Parrinello local descriptors but it allows the use of any species-resolved, fixed-size array that describes the input data.

PANNA allows the construction of neural network architectures with different sizes for each of the atomic species in the training set. Currently the allowed architecture is a deep neural network of fully connected layers, starting from the input feature vector and going through one or more hidden layers. The user can determine to train or freeze any layer, s/he can also transfer network parameters between species upon restart.

In summary, PANNA is an easy-to-use interface for obtaining neural network models for atomistic potentials, leveraging the highly optimized TensorFlow infrastructure to provide an efficient and parallelized, GPU-accelerated training.

It provides:

  • an input creation tool (atomistic calculation result -> G-vector )
  • an input packaging tool for quick processing of TensorFlow ( G-vector -> TFData bundle)
  • a network training tool
  • a network validation tool
  • a LAMMPS plugin
  • a bundle of sample data for testing[3]

See the full documentation of PANNA at https://gitlab.com/PANNAdevs/panna/blob/master/doc/PANNA_documentation.md

GitLab repository for PANNA: https://gitlab.com/PANNAdevs/panna

See manuscript at https://arxiv.org/abs/1907.03055

References

[1] J. Behler and M. Parrinello, “Generalized Neural-Network Representation of High-Dimensional  Potential-Energy Surfaces”, Phys. Rev. Lett. 98, 146401 (2007)

[2] Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg, “ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost», Chemical Science,(2017), DOI: 10.1039/C6SC05720A

[3] Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg, “ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules; Scientific Data, 4 (2017), Article number: 170193, DOI: 10.1038/sdata.2017.193

Share

Pyscal- A python module for structural analysis of atomic environments

Description

pyscal is a python module for the calculation of local atomic structural environments including Steinhardt’s bond orientational order parameters[1] during post-processing of atomistic simulation data. The core functionality of pyscal is written in C++ with python wrappers using pybind11 which allows for fast calculations and easy extensions in python.

Practical Applications

Steinhardt’s order parameters are widely used for the identification of crystal structures [3]. They are also used to distinguish if an atom is in a solid or liquid environment [4]. pyscal is inspired by the BondOrderAnalysis code, but has since incorporated many additional features and modifications. The pyscal module includes the following functionalities:

  • calculation of Steinhardt’s order parameters and their averaged version [2].
  • links with the Voro++ code, for the calculation of Steinhardt parameters weighted using the face areas of Voronoi polyhedra [3].
  • classification of atoms as solid or liquid [4].
  • clustering of particles based on a user defined property.
  • methods for calculating radial distribution functions, Voronoi volumes of particles, number of vertices and face area of Voronoi polyhedra, and coordination numbers.

Background information

See the application documentation for full details. A paper about pyscal is also available in Ref. [5].

The utilisation of Dask within the project came about as a result of the E-CAM High Throughput Computing ESDW held in Turin in 2018 and 2019.

The software module was developed by Sarath Menon, Grisell Díaz Leines and Jutta Rogal, and is under a GNU General Public License v3.0.

References

[1] Steinhardt, P. J., Nelson, D. R., & Ronchetti, M. (1983). Physical Review B, 28.

[2] Lechner, W., & Dellago, C. (2008). The Journal of Chemical Physics, 129.

[3] (12) Mickel, W., Kapfer, S. C., Schröder-Turk, G. E., & Mecke, K. (2013). The Journal of Chemical Physics, 138.

[4] (12) Auer, S., & Frenkel, D. (2005). Advances in Polymer Science, 173.

[5] Menon, S., Díaz Leines, G., & Rogal, J.(2019). pyscal: A python module for structural analysis of atomic environments. Journal of Open Source Software, 4(43), 1824

Share

Multi-GPU version of DL_MESO_DPD

This module implements the first version of the DL_MESO_DPD Mesoscale Simulation Package, with multiple NVidia Graphical Processing Units (GPUs).

In this module the main framework of a multi-GPU version of the DL_MESO_DPD code has been developed. The exchange of data between GPUs overlaps with the computation of the forces for the internal cells of each partition (a domain decomposition approach based on the MPI parallel version of DL_MESO_DPD has been followed). The current implementation is a proof of concept and relies on slow transfers of data from the GPU to the host and vice-versa. Faster implementations will be explored in future modules.

Future plans include benchmarking of the code with different data transfer implementations other than the current (trivial) GPU-host-GPU transfer mechanism. These are: of Peer To Peer communication within a node, CUDA-aware MPI, and CUDA-aware MPI with Direct Remote Memory Access (DRMA).

Practical application and exploitation of the code

Dissipative Particle Dynamics (DPD) is routinely used in an industrial context to find out the static and dynamic behaviour of soft-matter systems. Examples include colloidal dispersions, emulsions and other amphiphilic systems, polymer solutions, etc. Such materials are being produced or processed in industries like cosmetics, food, pharmaceutics, biomedicine, etc. Porting the method to GPUs is thus inherently useful in order to provide cheaper calculations.

See more information in the industry success story recently reported by E-CAM.

Software documentation and link to the source code can be found in our E-CAM software Library here.

Share

Integrating LAMMPS with OpenPathSampling

This module shows how LAMMPS can be used as Molecular Dynamic (MD) engine in OpenPathSampling (OPS) and it also provide a benchmark for the impact of OPS overhead over the MD engine.

Practical application and exploitation of the code

OpenPathSampling uses OpenMM as default engine for calculating the sampled trajectories. Other engines as GROMACS and LAMMPS can be used (despite not yet available in the official release) allowing to exploit different computer architectures like hybrid CPU-GPU and to simulate more complex problems.

In this module we present the source code for the integration of OPS with LAMMPS as well as a benchmark for of a simple test case to show the impact on the performance due to OPS overhead.

Software documentation and link to the source code can be found in our E-CAM software Library here.

Share

FFTXlib, a rewrite and optimisation of earlier versions of FFT related routines inside QE pre-v6

FFTXlib is mainly a rewrite and optimisation of earlier versions of FFT related routines inside Quantum ESPRESSO (QE) pre-v6; and finally their replacement. Despite many similarities, current version of FFTXlib dramatically changes the FFT strategy in the parallel execution, from 1D+2D FFT performed in QE pre v6 to a 1D+1D+1D one; to allow for greater flexibility in parallelisation.

Practical application and exploitation of the code

FFTXlib module is a collection of driver routines that allows the user to perform complex 3D fast Fourier transform (FFT) in the context of plane wave based electronic structure software. It contains routines to initialize the array structures, to calculate the desired grid shapes. It imposes underlying size assumptions and provides correspondence maps for indices between the two transform domains.

Once this data structure is constructed, forward or inverse in-place FFT can be performed. For this purpose FFTXlib can either use a local copy of an earlier version of FFTW (a commonly used open source FFT library), or it can also serve as a wrapper to external FFT libraries via conditional compilation using pre-processor directives. It supports both MPI and OpenMP parallelisation technologies.

FFTXlib is currently employed within Quantum Espresso package, a widely used suite of codes for electronic structure calculations and materials modeling in the nanoscale, based on planewave and pseudopotentials.

FFTXlib is also interfaced with “miniPWPP” module that solves the Kohn Sham equations in the basis of planewaves and soon to be released as a part of E-CAM Electronic Structure Library.

Software documentation and link to the source code can be found in our E-CAM software Library here.

Share

Extension of the ParaDiS code to include precipitate interactions, and code optimisation to run on HPC environment


Here present two featured software modules of the month:

  1. ParaDiS with precipitates
  2. ParaDiS with precipitates optimized to HPC environment

that provide extensions to the ParaDIS Discrete dislocation dynamics (DDD) code (LLNL, http://paradis.stanford.edu/) where dislocation/precipitate interactions are included. Module 2 was built to run the code on an HPC environment, by optimizing the original code for the Cray XC40 cluster at CSC in Finland. Software was developed by E-CAM partners at CSC and Aalto University (Finland).

Practical application and exploitation of the codes

The ParaDiS code is a free large scale dislocation dynamics (DD) simulation code to study the fundamental mechanisms of plasticity. However, DDD simulations don’t always take into account scenarios of impurities interacting with the dislocations and their motion. The consequences of the impurities are multiple: the yield stress is changed, and in general the plastic deformation process is greatly affected. Simulating these by DDD allows to look at a large number of issues from materials design to controlling the yield stress and may be done in a multiscale manner by computing the dislocation-precipitate interactions from microscopic simulations or by coarse-graining the DDD results for the stress-strain curves on the mesoscopic scale to more macroscopic Finite Element Method.

Modules 1 and 2 provide therefore an extension of the ParaDIS code by including dislocation/precipitate interactions. The possibility to run the code on HPC environments is also provided.

Software documentation and link to the source code can be found in our E-CAM software Library here.

Share

DBCSR@MatrixSwitch, an optimised library to deal with sparse matrices

MatrixSwitch is a module which acts as an intermediary interface layer between high-level and low-level routines dealing with matrix storage and manipulation. It allows a seamlessly switch between different software implementations of the matrix operations.

DBCSR is an optimized library to deal with sparse matrices, which appear frequently in many kind of numerical simulations.

In DBCSR@MatrixSwitch, DBCSR capabilities have been added to MatrixSwitch as an optional library dependency.

To carry out calculations in serial mode may be too slow sometimes and a parallelisation strategy is needed. Serial/parallel MatrixSwitch employs Lapack/ScaLapack to perform matrix operations, irrespective of their dense or sparse character. The disadvantage of the Lapack/ScaLapack schemes is that they are not optimised for sparse matrices. DBCSR provides the necessary algorithms to solve this problem and in addition is specially suited to work in parallel.

Direct link to module documentation: https://e-cam.readthedocs.io/en/latest/Electronic-Structure-Modules/modules/MatrixSwitchDBCSR/readme.html

Share

Abrupt GC-AdResS: A new and more general implementation of the Grand Canonical Adaptive Resolution Scheme (GC-AdResS)

The Grand Canonical Adaptive resolution scheme (GC-AdResS) gives a methodological description to partition a simulation box into different regions with different degrees of accuracy. For more details on the theory see Refs. [1,2,3].

In the context of an E-CAM pilot project focused on the development of the GC-AdResS scheme, an updated version of GC-AdResS was built and implemented in GROMACS, as reported in https://aip.scitation.org/doi/10.1063/1.5031206 (open access version: https://arxiv.org/abs/1806.09870). The main goal of the project is to develop a library or recipe with which GC-AdResS can be implemented in any Classical MD Code.

The current implementation of GC- AdResS in GROMACS has several performance problems. We know that the main performance loss of AdResS simulations in GROMACS is in the neighbouring list search and the generic serial force calculation linking the atomistic (AT) and coarse grained (CG) forces together via a smooth weighting function. Thus, to remove the bottleneck with respect to performance and a hindrance regarding the easy/general implementation into other codes and eliminate the non optimized force calculation, we had to change the neighbourlist search. This lead to a considerable speed up of the code. Furthermore it decouples the method directly from the core of any MD code, which does not hinder the performance and makes the scheme hardware independent[4].

This module presents a very straight forward way to implement a new partitioning scheme in GROMACS . And this solves two problems which affect the performance, the neighborlist search and the generic force kernel.

Information about module purpose, background information, software installation, testing and a link to the source code, can be found in our E-CAM software Library here.

E-CAM Deliverables D4.3[5] and D4.4[6] present more modules developed in the context of this pilot project.

References

[1] L. Delle Site and M. Praprotnik, “Molecular Systems with Open Boundaries: Theory and Simulation,” Phys. Rep., vol. 693, pp. 1–56, 2017

[2] H.Wang, C. Schütte, and L.Delle Site, “Adaptive Resolution Simulation (AdResS): A Smooth Thermodynamic and Structural Transition fromAtomistic to Coarse Grained Resolution and Vice Versa in a Grand Canonical Fashion,” J. Chem. Theory Comput., vol. 8, pp. 2878–2887, 2012

[3] H. Wang, C. Hartmann, C. Schütte, and L. Delle Site, “Grand-Canonical-Like Molecular-Dynamics Simulations by Using an Adaptive-Resolution Technique,” Phys. Rev. X, vol. 3, p. 011018, 2013

[4] C. Krekeler, A. Agarwal, C. Junghans, M. Prapotnik and L. Delle Site, “Adaptive resolution molecular dynamics technique: Down to the essential”, J. Chem. Phys. 149, 024104

[5] B. Duenweg, J. Castagna, S. Chiacchera, H. Kobayashi, and C. Krekeler, “D4.3: Meso– and multi–scale modelling E-CAM modules II”, March 2018 . [Online]. Available: https://doi.org/10.5281/zenodo.1210075

[6] B. Duenweg, J. Castagna, S. Chiacchera, and C. Krekeler, “D4.4: Meso– and multi–scale modelling E-CAM modules III”, Jan 2019 . [Online]. Available: https://doi.org/10.5281/zenodo.2555012

Share

Porting of electrostatics to the GPU version of DL_MESO_DPD


The porting of DL_MESO_DPD [1,2] to graphic cards (GPUs) was reported in deliverable D4.2 of E-CAM[3] (for a single GPU) and deliverable D4.3 [4] (for multiple GPUs) (Figure 1), and has now been extended to include electrostatics, with two alternative schemes as explained below. This work was recently reported on deliverable D4.4[5].

Figure 1: DL_MESO strong scaling results on PizDaint, obtained using 1.8 billion particles for 256 to 2048 GPUs. Results show very good scaling, with efficiency always above 89% for 2048 GPUs.


To allow Dissipative Particle Dynamics (DPD) methods to treat systems with electrically charged particles, several approaches have been proposed in the literature, mostly based on the Ewald summation method [6]. The DL_MESO_DPD  code includes Standard Ewald and Smooth Particle Mesh Ewald (SPME) methods (in version 2.7, released in December 2018). Accordingly, here the same methods are implemented for the single-GPU version of the code. Continue reading…

Share