Development of an HTC-based, scalable committor analysis tool in OpenPathSampling opens avenues to investigate enzymatic mechanisms linked to Covid-19

The E-CAM HPC Centre of Excellence and a PRACE team in Wroclaw have teamed up to develop High Throughput Computing (HTC) based tools to enable the computational investigation of reaction mechanisms in complex systems. These tools might help us gain understanding of enzymatic mechanisms used by the SARS-CoV-2 main protease [1]

Studying reaction mechanisms in complex systems requires powerful simulations. Committor analysis is a powerful, but computationally expensive tool, developed for this purpose. An alternative, less expensive option, consists in using the committor to generate initial trajectories for transition path sampling. In this project, the main goal was to integrate the committor analysis methodology with an existing software application, OpenPathSampling [2,3] (OPS), that performs many variants of transition path sampling (TPS) and transition interface sampling (TIS), as well as other useful calculations for rare events. OPS is performance portable across a range of HPC hardware and hosting sites..

The Committor analysis is essentially an ensemble calculation that maps straightforwardly to an HTC workflow, where typical individual tasks have moderate scalability and indefinite duration. Since this workflow requires dynamic and resilient scalability within the HTC framework, OPS was coupled to E-CAM’s HTC library jobqueue_features[4] that leverages the Dask [5, 6] data analytics framework and implements support for the management of MPI-aware tasks. 

The HTC library jobqueue_features proved to be resilient and to scale extremely well, meaning it can handle a very high number of simultaneous tasks: a stress test showed it can scale out to 1M tasks on all available architectures. OPS was expanded and its integration with the jobqueue features library was made trivial. In its current state, OPS can now almost seamlessly transition from use on a personal laptop to some of the largest HPC sites in Europe.

Integrating OPS and the HTC library resulted in an unprecedented parallelised committor simulation capability. These tools are currently being implemented for a committor simulation of the SARS-CoV-2 main protease. An initial analysis of the stable states, based on a long trajectory provided by D.E. Shaw Research [7] suggests that a loop region of the protein may act as a gate to the active site (Figure). This conformational change may regulate the accessibility of the active site of the main protease, and a better understanding of its mechanism could aid drug design.

The committor simulation can be used to explore the configuration space (taking more initial configurations), or to improve the accuracy of the calculated committor value (running more trajectories per configuration). Altogether, such data will provide insight into the dynamics of the protease loop region and the mechanism of its gate-like activity. In addition, the trajectories generated by the committor simulation can also be used as initial conditions for further studies using the transition path sampling approach.

Left image: “closed” configuration. Right Image: “open” configuration.

References

[1] Milosz Bialczak, Alan O’Cais, Mariusz Uchronski, & Adam Wlodarczyk. (2020). Intelligent HTC for Committor Analysis. http://doi.org/10.5281/zenodo.4382017

[2] David W.H. Swenson, Jan-Hendrik Prinz, Frank Noé, John D. Chodera, and Peter G. Bolhuis. “OpenPathSampling: A flexible, open framework for path sampling simulations. 1. Basics.” J. Chem. Theory Comput. 15, 813 (2019). https://doi.org/10.1021/acs.jctc.8b00626

[3] David W.H. Swenson, Jan-Hendrik Prinz, Frank Noé, John D. Chodera, and Peter G. Bolhuis. “OpenPathSampling: A flexible, open framework for path sampling simulations. 2. Building and Customizing Path Ensembles and Sample Schemes.” J. Chem. Theory Comput. 15, 837 (2019). https://doi.org/10.1021/acs.jctc.8b00627

[4] Alan O’Cais, David Swenson, Mariusz Uchronski, and Adam Wlodarczyk. Task Scheduling Library for Optimising Time-Scale Molecular Dynamics Simulations, August 2019.

[5] Dask Development Team. Dask: Library for dynamic task scheduling, 2016.

[6] Matthew Rocklin. Dask: Parallel computation with blocked algorithms and task scheduling. In Kathryn Hu and James Bergstra, editors, Proceedings of the 14th Python in Science Conference, pages 130 – 136, 2015.[7] No specific author. Long trajectory provided by D.E. Shaw Research. https://www.deshawresearch.com/downloads/download_trajectory_sarscov2.cgi/, 2020. [Online; accessed 22-Oct-2020].

Share

March Module of the Month: DL_MESO (DPD) on Kokkos for enhanced performance portability

 

This work relates to the implementation of a performance portable version of DL_MESO (DPD) using the Kokkos library. It focuses on porting to DL_MESO (DPD) the first and second loops of the Verlet Velocity (VV) scheme for the time marching scheme. This allows to run DL_MESO on NVidia GPUs as well as on other GPUs or architectures (many-core hardware like KNL), allowing performance portability as well as separation of concern between computational science and HPC.

Description

The VV scheme is made of 3 steps:

  1. a first velocity and particle positions integration by Delta t/2,
  2. a force calculation, and
  3. a second velocity integration by Delta t/2.

Steps 1) and 2) are documented is the following two modules

DL_MESO (DPD) on Kokkos: Verlet Velocity step 1

DL_MESO (DPD) on Kokkos: Verlet Velocity step 2

Note: Kokkos is a C++ library, while DL_MESO (DPD) is in Fortran90 Language. The current implementation requires a transfer between Fortran to C++, due to the use of Fortran pointers not bound using the ISO_C_BINDING standard. This constraint will be removed in future versions of DL_MESO.

Practical application

With the advent of heterogeneous hardware, achieving performance portability across different architectures is one of the main challenges in HPC. In fact, while specific languages, like CUDA, can give best performance for the NVidia hardware, they cannot be used with different GPU vendors limiting the usage across supercomputers worldwide.

In this module we use Kokkos, developed at Sandia National Laboratories, which consist of several C++ templated libraries which provide the capability to offload a workload to several different architectures, taking care of the memory layout and transfer between host and device.

Documentation and source code

The modules documentation is available on our software repository here. The modules have also been pushed into DL_MESO git repository as explained in the modules documentation.

Share

March Module of the Month: n2p2 – Improved link to HPC MD software

 

This module improves the connection of n2p2 to HPC software, in particular to LAMMPS, by creating a pull request to the official LAMMPS repository. Furthermore, the build process for the n2p2 interface library is enhanced to allow for a selective activation of different interfaces. A first application is also supported: the user contribution of an n2p2 interface to CabanaMD which uses Kokkos to drive MD simulations with NNP support on GPUs.

Description

This module is based on n2p2 (NeuralNetworkPotentialPackage), a C++ code for generation and application of neural network potentials used in molecular dynamics simulations. The source code and documentation are located here:

Although n2p2 was already shipped with source files for patching LAMMPS before (see here) , the build process required manual intervention of users. To avoid this in future versions of LAMMPS a pull request was created to include the n2p2/LAMMPS interface by default as a user package. In order to conform with LAMMPS contribution guidelines multiple issues were resolved, triggering these changes/additions to LAMMPS and n2p2:

• Modify the CMake build process to search and include n2p2

• Create additional documentation about the build settings

• Create a suitable example which can be shipped with LAMMPS

• Adapt documentation of the LAMMPS “pair_style nnp” command

• Change n2p2 to conformwith LAMMPS “bigbig” settings

• Change the source files “pair_nnp.(cpp/h)” to conform with the LAMMPS coding style

Furthermore, the n2p2 build system was adapted to allow for multiple interfaces to other software packages, with an option to select only those of interest to the user. As a first application, the user contributed CabanaMD interface was integrated in the new build process. CabanaMD is an ECP proxy application which makes use of the Kokkos performance portability library and n2p2 to port neural network potentials in MD simulations to GPUs and other HPC hardware.

Practical applications

The integration of neural network potentials directly in LAMMPS via a user package with linkage to n2p2 will greatly enhance the visibility and user experience. User will also be able to retrieve information about the neural network potential method and its use directly on the LAMMPS documentation page. Modifying the n2p2 build process to allow for multiple interfacing software simplifies the development of CabanaMD. This contribution of n2p2 users can be viewed as a precursor of a Kokkos implementation of NNPs in LAMMPS. Ultimately, such an addition to n2p2/LAMMPS would be of great value for the community as it would allow for running molecular dynamics simulation with NNPs on GPUs.

A success story on the E-CAM developments on n2p2 is also available, which explains in more detail all the practical applications from this work: Implementation of High-Dimensional Neural Network Potentials .

Documentation and source Code

Module documentation is available on our software repository here.

Share

Implementation of High-Dimensional Neural Network Potentials

 

Abstract

In this conversation with Andreas Singraber, post-doc in E-CAM until last month, we will discover the ensemble of his work to expand the Neural Network Potential (NNP) Package n2p2 and to improve user accessibility to the code via the LAMMPS package. Andreas will talk about new tools that he developed during his E-CAM pilot project, that can provide valuable input for future developments of NNP based coarse-grained models. He will describe how E-CAM has impacted his career and led him to recently integrate a software company as a scientific software engineer.

With Dr. Andreas Singraber, Vienna Ab initio Simulation Package (VASP)

  Continue reading…
Share