Development of an HTC-based, scalable committor analysis tool in OpenPathSampling opens avenues to investigate enzymatic mechanisms linked to Covid-19

The E-CAM HPC Centre of Excellence and a PRACE team in Wroclaw have teamed up to develop High Throughput Computing (HTC) based tools to enable the computational investigation of reaction mechanisms in complex systems. These tools might help us gain understanding of enzymatic mechanisms used by the SARS-CoV-2 main protease [1]

Studying reaction mechanisms in complex systems requires powerful simulations. Committor analysis is a powerful, but computationally expensive tool, developed for this purpose. An alternative, less expensive option, consists in using the committor to generate initial trajectories for transition path sampling. In this project, the main goal was to integrate the committor analysis methodology with an existing software application, OpenPathSampling [2,3] (OPS), that performs many variants of transition path sampling (TPS) and transition interface sampling (TIS), as well as other useful calculations for rare events. OPS is performance portable across a range of HPC hardware and hosting sites..

The Committor analysis is essentially an ensemble calculation that maps straightforwardly to an HTC workflow, where typical individual tasks have moderate scalability and indefinite duration. Since this workflow requires dynamic and resilient scalability within the HTC framework, OPS was coupled to E-CAM’s HTC library jobqueue_features[4] that leverages the Dask [5, 6] data analytics framework and implements support for the management of MPI-aware tasks. 

The HTC library jobqueue_features proved to be resilient and to scale extremely well, meaning it can handle a very high number of simultaneous tasks: a stress test showed it can scale out to 1M tasks on all available architectures. OPS was expanded and its integration with the jobqueue features library was made trivial. In its current state, OPS can now almost seamlessly transition from use on a personal laptop to some of the largest HPC sites in Europe.

Integrating OPS and the HTC library resulted in an unprecedented parallelised committor simulation capability. These tools are currently being implemented for a committor simulation of the SARS-CoV-2 main protease. An initial analysis of the stable states, based on a long trajectory provided by D.E. Shaw Research [7] suggests that a loop region of the protein may act as a gate to the active site. This conformational change may regulate the accessibility of the active site of the main protease, and a better understanding of its mechanism could aid drug design.

The committor simulation can be used to explore the configuration space (taking more initial configurations), or to improve the accuracy of the calculated committor value (running more trajectories per configuration). Altogether, such data will provide insight into the dynamics of the protease loop region and the mechanism of its gate-like activity. In addition, the trajectories generated by the committor simulation can also be used as initial conditions for further studies using the transition path sampling approach.


[1] Milosz Bialczak, Alan O’Cais, Mariusz Uchronski, & Adam Wlodarczyk. (2020). Intelligent HTC for Committor Analysis.

[2] David W.H. Swenson, Jan-Hendrik Prinz, Frank Noé, John D. Chodera, and Peter G. Bolhuis. “OpenPathSampling: A flexible, open framework for path sampling simulations. 1. Basics.” J. Chem. Theory Comput. 15, 813 (2019).

[3] David W.H. Swenson, Jan-Hendrik Prinz, Frank Noé, John D. Chodera, and Peter G. Bolhuis. “OpenPathSampling: A flexible, open framework for path sampling simulations. 2. Building and Customizing Path Ensembles and Sample Schemes.” J. Chem. Theory Comput. 15, 837 (2019).

[4] Alan O’Cais, David Swenson, Mariusz Uchronski, and Adam Wlodarczyk. Task Scheduling Library for Optimising Time-Scale Molecular Dynamics Simulations, August 2019.

[5] Dask Development Team. Dask: Library for dynamic task scheduling, 2016.

[6] Matthew Rocklin. Dask: Parallel computation with blocked algorithms and task scheduling. In Kathryn Hu and James Bergstra, editors, Proceedings of the 14th Python in Science Conference, pages 130 – 136, 2015.[7] No specific author. Long trajectory provided by D.E. Shaw Research., 2020. [Online; accessed 22-Oct-2020].


EESSI-based GitHub Action for Continuous Integration



This module sets up the European Environment for Scientific Software Installations (EESSI) for use in GitHub Workflows.

The European Environment for Scientific Software Installations (EESSI) is a collaboration between a number of academic and industrial partners in the HPC community to set up a shared stack of scientific software installations to avoid the installation and execution of sub-optimal applications on HPC resources. The software stack is intended to work on laptops, personal workstations, HPC clusters and in the cloud, which means the project will need to support different CPUs, networks, GPUs, and so on.

The EESSI project is supported by E-CAM, and forms the basis of the software stack used within the LearnHPC project (which is also supported by E-CAM).

EESSI can be leveraged in continuous integration (CI) workflows to easily provide the dependencies of an application. With this module we create a GitHub Action for EESSI so that it can be used within a projects CI on GitHub. By using the Action, you can use environment modules to resolve the dependencies of your application in highly predictable and reproducible way. This includes state-of-the-art compilers, MPI runtimes and mathematical libraries.

Documentation and source code

Documentation on our software repository here. See also the GitHub repository of the EESSI GitHub Action.


March Module of the Month: DL_MESO (DPD) on Kokkos for enhanced performance portability


This work relates to the implementation of a performance portable version of DL_MESO (DPD) using the Kokkos library. It focuses on porting to DL_MESO (DPD) the first and second loops of the Verlet Velocity (VV) scheme for the time marching scheme. This allows to run DL_MESO on NVidia GPUs as well as on other GPUs or architectures (many-core hardware like KNL), allowing performance portability as well as separation of concern between computational science and HPC.


The VV scheme is made of 3 steps:

  1. a first velocity and particle positions integration by Delta t/2,
  2. a force calculation, and
  3. a second velocity integration by Delta t/2.

Steps 1) and 2) are documented is the following two modules

DL_MESO (DPD) on Kokkos: Verlet Velocity step 1

DL_MESO (DPD) on Kokkos: Verlet Velocity step 2

Note: Kokkos is a C++ library, while DL_MESO (DPD) is in Fortran90 Language. The current implementation requires a transfer between Fortran to C++, due to the use of Fortran pointers not bound using the ISO_C_BINDING standard. This constraint will be removed in future versions of DL_MESO.

Practical application

With the advent of heterogeneous hardware, achieving performance portability across different architectures is one of the main challenges in HPC. In fact, while specific languages, like CUDA, can give best performance for the NVidia hardware, they cannot be used with different GPU vendors limiting the usage across supercomputers worldwide.

In this module we use Kokkos, developed at Sandia National Laboratories, which consist of several C++ templated libraries which provide the capability to offload a workload to several different architectures, taking care of the memory layout and transfer between host and device.

Documentation and source code

The modules documentation is available on our software repository here. The modules have also been pushed into DL_MESO git repository as explained in the modules documentation.


March Module of the Month: n2p2 – Improved link to HPC MD software


This module improves the connection of n2p2 to HPC software, in particular to LAMMPS, by creating a pull request to the official LAMMPS repository. Furthermore, the build process for the n2p2 interface library is enhanced to allow for a selective activation of different interfaces. A first application is also supported: the user contribution of an n2p2 interface to CabanaMD which uses Kokkos to drive MD simulations with NNP support on GPUs.


This module is based on n2p2 (NeuralNetworkPotentialPackage), a C++ code for generation and application of neural network potentials used in molecular dynamics simulations. The source code and documentation are located here:

Although n2p2 was already shipped with source files for patching LAMMPS before (see here) , the build process required manual intervention of users. To avoid this in future versions of LAMMPS a pull request was created to include the n2p2/LAMMPS interface by default as a user package. In order to conform with LAMMPS contribution guidelines multiple issues were resolved, triggering these changes/additions to LAMMPS and n2p2:

• Modify the CMake build process to search and include n2p2

• Create additional documentation about the build settings

• Create a suitable example which can be shipped with LAMMPS

• Adapt documentation of the LAMMPS “pair_style nnp” command

• Change n2p2 to conformwith LAMMPS “bigbig” settings

• Change the source files “pair_nnp.(cpp/h)” to conform with the LAMMPS coding style

Furthermore, the n2p2 build system was adapted to allow for multiple interfaces to other software packages, with an option to select only those of interest to the user. As a first application, the user contributed CabanaMD interface was integrated in the new build process. CabanaMD is an ECP proxy application which makes use of the Kokkos performance portability library and n2p2 to port neural network potentials in MD simulations to GPUs and other HPC hardware.

Practical applications

The integration of neural network potentials directly in LAMMPS via a user package with linkage to n2p2 will greatly enhance the visibility and user experience. User will also be able to retrieve information about the neural network potential method and its use directly on the LAMMPS documentation page. Modifying the n2p2 build process to allow for multiple interfacing software simplifies the development of CabanaMD. This contribution of n2p2 users can be viewed as a precursor of a Kokkos implementation of NNPs in LAMMPS. Ultimately, such an addition to n2p2/LAMMPS would be of great value for the community as it would allow for running molecular dynamics simulation with NNPs on GPUs.

A success story on the E-CAM developments on n2p2 is also available, which explains in more detail all the practical applications from this work: Implementation of High-Dimensional Neural Network Potentials .

Documentation and source Code

Module documentation is available on our software repository here.


Implementation of High-Dimensional Neural Network Potentials



In this conversation with Andreas Singraber, post-doc in E-CAM until last month, we will discover the ensemble of his work to expand the Neural Network Potential (NNP) Package n2p2 and to improve user accessibility to the code via the LAMMPS package. Andreas will talk about new tools that he developed during his E-CAM pilot project, that can provide valuable input for future developments of NNP based coarse-grained models. He will describe how E-CAM has impacted his career and led him to recently integrate a software company as a scientific software engineer.

With Dr. Andreas Singraber, Vienna Ab initio Simulation Package (VASP)

  Continue reading…

February Module of the Month: ALL library implementation in HemeLB, a CoE collaboration


This module describes the work done in E-CAM in cooperation with the HemeLB code from the CompBioMed Centre of Excellence.

HemeLB is a high performance lattice-Boltzmann solver optimised for simulating blood flow through sparse geometries, such as those found in the human vasculature. The code is used within the CompBioMed HPC Centre of Excellence H2020 project and is already highly optimised for HPC usage. Nevertheless, in an E-CAM workshop on the load balancing library ALL hosted at the Juelich Supercomputing Centre, a cooperation was set up in order to analyse and test whether the use of ALL could improve the existing scalability of the code.

ALL was designed to work with particle codes, therefore it was interesting to apply the library to a lattice-Boltzmann solver, which usually is not particle-based. The different grid points of the solution grid were designated as particles and since each of the grid-points already was assigned a workload, the sum of grid-point workloads could be used as domain work load.

As a result, it was demonstrated that the domain compositions provided by ALL show a better theoretical load distribution. Tests to check if this translates into better code performance are inconclusive as yet, due to hardware related issues on the testing platforms. However, these are currently under further investigation, and more definitive results about the performance of the ALL-provided domain decompositions can be expected in the near future. The results were part of an article about HemeLB, which was published in 2020[1] .

Documentation and source code


Towards blood flow in the virtual human: efficient self-coupling of HemeLB
J. W. S. McCullough, R. A. Richardson, A. Patronis, R. Halver, R. Marshall, M. Ruefenacht, B. J. N. Wylie, T. Odaker, M. Wiedemann, B. Lloyd, E. Neufeld, G. Sutmann, A. Skjellum, D. Kranzlmüller and P. V. Coveney
Interface Focus 2020, 11: 20190119
DOI: (open access)


Industry training at the MESOSCALE


To further expand the portfolio of activities targeted at industrialists, E-CAM has established a series of new events targeted at training interested industrial researchers on the simulation and modelling techniques implemented in specific codes and in the direct use of this software for their industrial applications.

The first event of this series will focus on the area of meso- and multiscale simulations and on the flagship code DL_MESO:

Industry Training at the MESOSCALE

22nd – 25th March 2021
Online / UKRI STFC Daresbury Laboratory

In this workshop we will introduce DL_MESO: a software package for mesoscale simulations. Usage of the software will be gradually presented, starting with tutorials based on theoretical background and following up with hands-on sessions. We will focus on the Dissipative Particle Dynamics (DPD) methodology, exploring the different capabilities of DL_MESO_DPD via practical examples that reflect daily industrial challenges. 

DL_MESO has been used for a wide range of problems of both scientific and industrial interest. The code is used, for example, in projects with Unilever, Syngenta and Infineum – to develop DPD parameterisation strategies and simulation protocols to predict important properties of newly-devised surfactant-based formulations; with IBM Research Europe – to model nanofluidic multiphase. The code developers themselves will provide the training. The event is co-organized by Formeric, a company that helps industrial users to study their own formulated projects, primarily by developing a software platform to make it easier for them to access DPD simulations and modelling tools.

As part of the event, UKRI STFC offers a 6-month one seat free licence of DL_MESO 2.7 to be used soon after the end of the event, which will help testing the software.

Don’t miss this opportunity to be trained by the experts on the methods and on the codes themselves! Register for event at

Download event flyer


Another successful online training event !


Our last Extended Software Development Workshop (ESDW) took place on the 18th-22nd January[1], and given its length (5 days) and it’s nature (theory and hands-on training sessions) it was a real success! “The workshop went very well, participants seem to have enjoyed and they lasted until the end !”, said organiser Jony Castagna, computational scientist and E-CAM programmer at UKRI STFC Daresbury Laboratory. The event, organised at the CECAM-UK-DARESBURY Node[2], focused on HPC for mesoscale simulation, and aimed at introducing participants to Dissipative Particle Dynamics (DPD) and the mesoscale simulation package DL_MESO [3] (DL_MESO_DPD). DL_MESO is developed at UKRI STFC Daresbury by Michael Seaton, computational chemist at Daresbury and also an organiser of this event.

Another component of this workshop was parallel programming of hybrid CPU-GPU systems. In particular, DL_MESO has recently been ported to multi-GPU architectures[4] and runs efficiently up to 4096 GPUs, an effort supported by E-CAM (thank you Jony!). Part of this workshop was dedicated to theory lectures and hands-on sessions on GPU architectures and OpenACC (NVidia DLI course) given by Jony, which is an NVidia DLI Certified Instructor. He said “The intention is not only to port mesoscale solvers on GPUs, but also to expose the community to this new programming paradigm, which they can benefit from in their own fields of research”.

All sessions in this ESDW were followed by discussions and hands-on exercises. Organisers were supported by another STFC colleague and former E-CAM post-doc Silvia Chiacchiera. One of the participants wrote “Thank you so much for your effort. This workshop will cause a significant shift in my thinking and approach”.

21 people registered for to the event; but by the third day there were only 9… from which 5 lasted until the last session! A picture taken from the last session talks by itself 🙂

Do you want to join our next training event ? Check out our programme :

Full calendar at





[3] Seaton M.A. et al. “DL_MESO: highly scalable mesoscale simulations”, Molecular Simulation 2013, 39

[4] J. Castagna, X. Guo, M. Seaton and A. O’Cais, “Towards extreme scale dissipative particle dynamics simulations using multiple GPGPUs”,
Computer Physics Communications, 2020, 107159
DOI: 10.1016/j.cpc.2020.107159


January Module of the Month: MaZe, Mass-Zero Constrained Dynamics for Orbital Free Density Functional Theory



The program performs Orbital-Free Density Functional Theory Molecular Dynamics (OF-DFT-MD) using the Mass-Zero (MaZe) constrained molecular dynamics approach described in [1].

This method enforces, at each time step, the Born-Oppenheimer condition that the system relaxes instantaneously to the ground state through the formalism of massless constraints. The adiabatic separation between the degrees of freedom is enforced rigorously, and the algorithm is symplectic and time-reversible in both physical and additional set of degrees of freedom.

The computation of the electronic density is carried out in reciprocal space through a plane-waves expansion so that the mass-zero degrees of freedom are associated to the Fourier coefficients of the electronic density field. The evolution of the ions is performed using Velocity-Verlet algorithm, while the SHAKE algorithm is used for computation of the additional degrees of freedom. The code can sample the NVE and the NVT ensemble, the latter through a Langevin thermostat.

The code was optimised to run on HPC machines, as explained in the software documentation. The proposed optimisations allow a reduction of the execution time by roughly 50% compared to the original version of the code.

Caption: MaZe optimisation of the electronic density at each nuclear step along an orbital-free DFT Born–Oppenheimer trajectory. Very high speed of convergence is achieved by interpreting the optimisation as a constraint solved via an original implementation of the SHAKE algorithm.  The number of iterations needed to converge the electronic density and the time per time step for MaZe (red) and standard conjugate gradients (blue) are compared for the indicated kinetic energy functionals (G_c is the energy cut-off).

Practical application

The code is intended for condensed matter physicists and for material scientists and it can be used for various purposes related to the subject. Even though some analysis tools are included in the package, the main goal of the software is to produce particles trajectories to be analysed in post-production by means of external software.

MaZe implements the orbital-free formulation of density functional theory, in which the optimisation of the energy functional is performed directly in terms of the electronic density without use of Kohn-Sham orbitals. This feature avoids the need for satisfying the orthonormality constraint among orbitals and allows the computational complexity of the code to scale linearly with the dimensionality of the system. The accuracy of the simulation relies on the choice of the kinetic energy functional, which has to be provided in terms of the electronic density alone.

Documentation and source code

The complete documentation is at this location. The source code is available from the E-CAM Gitlab under the MaZe project (software is under embargo until publication leveraging the developments is achieved. Contact code developers or for more information.)


[1] Sara Bonella, Alessandro Coretti, Rodolphe Vuilleumier, Giovanni Ciccotti, “Adiabatic motion and statistical mechanics via mass-zero constrained dynamics”, Phys. Chem. Chem. Phys. 2020, 22, 10775-10785 DOI: 10.1039/D0CP00163E
Pre-print version (open access):


The ALL Load Balancing Library



Scalability of parallel applications depends on a number of characteristics, among which is efficient communication, equal distribution of work or efficient data lay-out. Especially for methods based on domain decomposition, as it is standard for, e.g., molecular dynamics, dissipative particle dynamics or particle-in-cell methods, unequal load is to be expected for cases where particles are not distributed homogeneously, different costs of interaction calculations are present or heterogeneous architectures are invoked, to name a few. For these scenarios the code has to decide how to redistribute the work among processes according to a work sharing protocol or to dynamically adjust computational domains, to balance the workload. The A Load Balancing Library (ALL) developed within E-CAM at the Julich Supercomputing Center aims to provide an easy and portable way to include dynamic domain-based load balancing into particle based simulation codes. It provides several schemes to find the ideal split of the workload, from the simplest orthogonal non staggered domain decomposition, to the more fancy Voronoi mesh scheme. Within this text we provide an overview of ALL, its capabilities and current use cases, as well as where to find additional information on the library.



Most modern parallelized (classical) particle simulation programs are based on a spatial decomposition method as an underlying parallel algorithm: different processors administrate different spatial regions of the simulation domain and keep track of those particles that are located in their respective region. Processors exchange information

  • in order to compute interactions between particles located on different processors
  • to exchange particles that have moved to a region administered by a different processor.

This implies that the workload of a given processor is very much determined by its number of particles, or, more precisely, by the number of interactions that are to be evaluated within its spatial region.

Certain systems of high physical and practical interest (e.g. condensing fluids) dynamically develop into a state where the distribution of particles becomes spatially inhomogeneous. Unless special care is being taken, this results in a substantially inhomogeneous distribution of the processors’ workload. Since the work usually has to be synchronized between the processors, the runtime is determined by the slowest processor (i.e. the one with the highest workload). In the extreme case, this means that a large fraction of the processors are idle during these waiting times. This problem becomes particularly severe if one aims at strong scaling, where the number of processors is increased at constant problem size: Every processor administrates smaller and smaller regions and therefore inhomogeneities will become more and more pronounced. This will eventually saturate the scalability of a given problem, already at a processor number that is still so small that communication overhead remains negligible.

The solution to this problem is the inclusion of dynamic load balancing techniques. These methods redistribute the workload among the processors, by lowering the load of the most busy cores and enhancing the load of the most idle ones. Fortunately, several successful techniques are known already to put this strategy into practice. Nevertheless, dynamic load balancing that is both efficient and widely applicable implies highly non-trivial coding work. Therefore it has not yet been implemented in a number of important codes. 

The A Load-Balancing Library (ALL) developed within E-CAM at the Simulation Laboratory Molecular Systems of the Juelich Supercomputing Centre, aims to provide an easy and portable way to include dynamic domain-based load balancing into particle based simulation codes. It was created in the context of an Extended Software Development Workshop (ESDW) within E-CAM (see ALL ESDW event details), where code developers of CECAM community codes were invited together with E-CAM postdocs, to work on the implementation of load balancing strategies. The goal of this activity is to increase the scalability of applications to a larger number of cores on HPC systems, for spatially inhomogeneous systems, and thus to reduce the time-to-solution of the applications .

Particle system before and after the load balancing. Left: equal domain sizes with bad balance; right: unequal domain sizes and good work load.

ALL includes several load-balancing schemes, with additional approaches currently being added. The following list gives an overview about the currently included schemes: 

  1. Tensor-Product method: For the Tensor-Product method, the work on all processes (subdomains) is reduced over the cartesian planes in the systems. This work is then equalized by adjusting the borders of the cartesian planes.
  2. Staggered Grid Method: For the staggered-grid scheme, a 3-step hierarchical approach is applied: work over the Cartesian planes is reduced before the borders of these planes are adjusted; in each of the Cartesian planes the work is reduced for each Cartesian column, these columns are then adjusted to each other to homogenise the work in each column; the work between neighbouring domains in each column is adjusted. Each adjustment is done locally with the neighbouring planes, columns or domains by adjusting the adjacent boundaries.
  3. Unstructured Mesh Method: In contrast to the Tensor-Product method and the Staggered Grid Method, the unstructured mesh method adjusts domains not by moving boundaries but vertices, i.e. corner points, of domains. For each vertex, a force, based on the differences in work of the neighboring domains, is computed and the vertex is shifted in a way to equalize the work between these neighboring domains.
  4. Voronoi Mesh Method: Similar to the topological mesh method (Unstructured Mesh Method), the Voronoi mesh method computes a force, based on work differences. In contrast to the topological mesh method, the force acts on a Voronoi point rather than a vertex, i.e. a point defining a Voronoi cell, which describes the domain. Consequently, the number of neighbors is not a conserved quantity, i.e. the topology may change over time.
  5. Histogram-based Staggered Grid Method: The histogram-based staggered-grid scheme results in the same grid as the staggered-grid scheme (see Staggered Grid Method), this scheme uses the cumulative work function in each of the three cartesian directions in order to generate this grid. Using histograms and the previously defined distribution of process domains in a cartesian grid, this scheme generates in three steps a staggered-grid result, in which the work is distributed as evenly as the resolution of the underlying histogram allows. In contrast to the other schemes this scheme depends on a global exchange of work between processes.

Use cases

ALL is being tested with the HemeLB code[1] from the Centre of Excellence CompBiomed. A recent paper describes how HemeLB’s developments in memory management and load balancing (with ALL) allow near linear scaling performance of the code on hundreds of thousands of computer codes[2]. 

ALL is implemented in the multi-GPU version of DL_MESO_DPD package (see related news item here). The intention of this integration is to allow for better performance when modelling complex systems with DL_MESO_DPD[3], like large proteins or lipid bilayers, redistributing the work load across the GPUs.



[1] D. Groen, J. Hetherington, H.B. Carver, R.W. Nash, M.O. Bernabeu, and P.V. Coveney. Analysing and modelling the performance of the HemeLB lattice-Boltzmann simulation environment. Journal of Computational Science, 4(5):412 – 422, 2013. doi: // HemeLB URL:

[2] McCullough JWS et al. 2021 Towards blood flow in the virtual human: efficient self-coupling of HemeLB. Interface Focus 11: 20190119. doi: 

[3] MA Seaton, RL Anderson, S Metz and W Smith, DL_MESO: highly scalable mesoscale simulations, Mol Simul 39 (10), 796–821 (2013) doi: //