PLUMED wrapper for OpenPathSampling

 

PLUMED is a widely used and versatile rare-event sampling and analysis code that can be used with various Molecular Dynamics (MD) engines. It has a very intuitive and versatile syntax for the definition of Collective Variables (CVs), and a wide variety of sampling methods, which accounts for its widespread use. The present module allows PLUMED and OPS to be used together. More details on the module can be found here.

Practical application and exploitation of the code

Transition path sampling simulations and analysis rely on accurate state definitions. Such states are typically defined as volumes in a Collective Variables space. OPS already supports a number of CVs, including the ones defined in the MDTraj python library. PLUMED offers a wide variety of extra CVs, which are enabled in OPS by this module. Many of PLUMED’s dozens of CVs have a biomolecular focus, but they are also general enough for other applications. PLUMED’s popularity (over 500 citations in 4 years after the release of PLUMED2 [1]) is greatly based on the fact that it works with many MD codes. OPS is now added to that list. The PLUMED code is well-maintained and documented for both users and developers. Several tutorials and a mailing list are available to address FAQs. More information about PLUMED is available here.

 

[1] G. Tribello, M. Bonomi, D. Branduardi, C. Camilloni, G. Bussi PLUMED 2: New feathers for an old bird Comput. Phys. Commun. 185 (2014) 604

 

 

Share

The Curse of Dimensionality in Data-Intensive Modeling in Medicine, Biology, and Diagnostics

With Prof. Tim Conrad (TC), Free University of Berlin, and Dr. Donal Mackernan (DM), University College Dublin.

Abstract

Until recently the idea that methods rooted in statistical physics could be used to elucidate phenomena and underlying mechanisms in biology and medicine was widely considered to be a distant dream.  Elements of that dream are beginning to be realized, aided very considerably by machine learning and advances in measurement, exemplified by the development of large-scale biomedical data analysis for next-generation diagnostics. In this E-CAM interview of Tim Conrad,  the growing importance of diagnostics in medicine and biology is discussed. One difficulty faced by such developments and shared with particle-based simulation is the “curse of dimensionality”. It is manifest in problems such as: (a) the use of a very large number of order parameters when trying to identify reaction mechanisms, nucleation pathways, metastable states, reaction rates; polymorph recognition (b) machine learning  applied to electronic structure  problems – such as neural network based potentials need very high dimensional basis sets; (c) systematic coarse-graining would ideally start with a very high dimensional space and systematically reduce the dimension.  The opportunities and challenges for scientists engaging with industry are also discussed. Tim Conrad is Professor of “Medical Bioinformatics” at the Institute of Mathematics of the Free University of Berlin and head of MedLab, one of the four laboratories of the Modal research campus. MODAL is a public-private partnership project which conducts mathematical research on data-intensive modeling, simulation, and optimization of complex processes in the fields of energy, health, mobility, and communication.  Tim Conrad is also the founder of three successful start-up companies.

In this E-CAM interview with Prof. Tim Conrad, the growing importance of diagnostics in medicine and biology is discussed, including concepts rooted in signal analysis relevant to systematic dimensional reduction, and pattern recognition, and the possibilities of their application to systematic coarse-graining. The opportunities and challenges for scientists of start-up companies are also discussed based on experience.

 

Continue reading…

Share

Scientific reports from the 2018 E-CAM workshops are now available on our website

 

The scientific reports* from the following workshops conducted in year 3 of the project E-CAM (2018):

  1. E-CAM Scoping Workshop: “Solubility prediction”, 14 – 15 May 2018, Ecole Normale Supérieure de Lyon, France,
  2. E-CAM Scoping Workshop: “Dissipative particle dynamics: Where do we stand on predictive application?”, 24 – 26 April 2018, Daresbury Laboratory, United Kingdom,
  3. E-CAM Extended Software Development Workshop 11: “Quantum Dynamics”, 18 – 29 June 2018, Maison de la Simulation, France,

are now available for download on our website at this location. Furthermore, they will also be integrated in the CECAM Report of Activities for 2018, published every year on the website www.cecam.org.

 

*© CECAM 2018, all rights reserved.

Please address any comments or questions to info@e-cam2020.eu.

Share

Spring shooting – A module for improving efficiency of transition path sampling

 

Transition path sampling is most efficient when paths are generated from the top of the free energy barrier. However, complex (biomolecular) activated processes, such as nucleation or protein binding/unbinding, can have asymmetric and peaked barriers. Using uniform selection on these type of processes will not be efficient, as it, on average, results in selected points that are not on the top of the barrier. Paths generated from these points have a low acceptance probability and accepted transition paths decorrelate slowly, resulting in a low overall efficiency. The Spring shooting module was developed to increase the efficiency of path sampling of these types of barriers, without any prior knowledge of the barrier shape. The spring shooting algorithm uses a shooting point selector that is biased with a spring potential. This bias pulls the selection of points towards the transition state at the top of the barrier. The paths that are generated from points selected by this biased selector therefore have an increased acceptance probability and the decorrelation between accepted transition paths is also increased. This results in a higher overall efficiency. The spring shooting algorithm is described in more detail in a paper by Brotzakis and Bolhuis. [1]  This module was developed during the ESDW on classical molecular dynamics held in Amsterdam.

 

[1] Z. F. Brotzakis, P. G. Bolhuis A one-way shooting algorithm for transition path sampling of asymmetric barriers J. Chem. Phys. 145 (2016) 164112

Share

Contact Map – a package for analyzing and exploring contacts, from a trajectory generated by MD

 

Contacts can be an important tool for defining (meta)stable states in processes involving biomolecules. For example, an analysis of contacts can be particularly useful when defining bound states during a binding processes between proteins, DNA, and small molecules (such as potential drugs).

The contacts analyzed by the contact_map package can be either intermolecular or intramolecular, and can be analyzed on a residue-residue basis or an atom-atom basis.

This package makes it very easy to answer questions like:

  • What contacts are present in a trajectory?
  • Which contacts are most common in a trajectory?
  • What is the difference between the frequency of contacts in one trajectory and another? (Or with a specific frame, such as a PDB entry.)
  • For a particular residue-residue contact pair of interest, which atoms are most frequently in contact?

It also facilitates visualization of the contact matrix, with colors representing the fraction of trajectory time that the contact was present. Full documentation available at http://contact-map.readthedocs.io/.

Information about software installation, testing and a link to the source code, can be found in our E-CAM software Library here.

Practical application and exploitation of the code

The practical application of this software module is the pilot project in collaboration with BiKi Technologies on “Binding Kinetics“, sustained by an E-CAM postdoctoral researcher at University of Amsterdam.  The project aims at investigating the binding/unbinding of a selective reversible inhibitor for protein GSK3β.

Contacts between a ligand and a protein are an excellent way to characterize “hotspots” – states where the ligand stays for a significant amount of time, but not nearly as long as in the final binding pocket. These hotspots are metastable states in path sampling, and should be treated with a multiple state approach. Therefore, attempting to identify those states would be a necessarily preliminary step to prepare the path sampling simulation.

Other more general applications to this module include protein-protein aggregation or DNA-protein binding, as well as large scale conformational changes in biomolecules, such as protein folding.

 

Share

From Rational Design of Molecular Biosensors to Patent and potential Start-up

 

Dr. Donal Mackernan, University College Dublin

Abstract

The power of advanced simulation combined with statistical theory , experimental know-how and high performance computing is used to design a protein based molecular switch sensor with remarkable sensitivity and significant industry potential. The sensor technology has applications across commercial markets including diagnostics, immuno-chemistry, and therapeutics.

 

Continue reading…

Share

Extended Software Development Workshop: Intelligent high throughput computing for scientific applications

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

High throughput computing (HTC) is a computing paradigm focused on the execution of many loosely coupled tasks. It is a useful and general approach to parallelizing (nearly) embarrassingly parallel problems. Distributed computing middleware, such as Celery [1] or COMP Superscalar (COMPSs) [2], can include tools to facilitate HTC, although there may be challenges extending such approaches to the exascale.

Across scientific fields, HTC is becoming a necessary approach in order to fully utilize next-generation computer hardware. As an example, consider molecular dynamics: Excellent work over the years has developed software that can simulate a single trajectory very efficiently using massive parallelization [3]. Unfortunately, for a fixed number of atoms, the extent of possible parallelization is limited. However, many methods, including semiclassical approaches to quantum dynamics [4,5] and some approaches to rare events [6,7], require running thousands of independent molecular dynamics trajectories. Intelligent HTC, which can treat each trajectory as a task and manage data dependencies between tasks, provides a way to run these simulations on hardware up to the exascale, thus opening the possibility of studying previously intractable systems.

In practice, many scientific programmers are not aware of the range of middleware to facilitate parallel programming. When HTC-like approaches are implemented as part of a scientific software project, they are often done manually, or through custom scripts to manage SSH, or by running separate jobs and manually collating the results. Using the intelligent high-level approaches enabled by distributed computing middleware will simplify and speed up development.

Furthermore, middleware frameworks can meet the needs of many different computing infrastructures. For example, in addition to working within a single job on a cluster, COMPSs includes support for working through a cluster’s queueing system or working on a distributed grid. Moreover, architecting a software package such that it can take advantage of one HTC library will make it easy to use other HTC middleware. Having all of these possibilities immediately available will enable developers to quickly create software that can meet the needs of many users.

This E-CAM Extended Software Development Workshop (ESDW) will focus on intelligent HTC as a technique that crosses many domains within the molecular simulation community in general and the E-CAM community in particular. Teaching developers how to incorporate middleware for HTC matches E-CAM’s goal of training scientific developers on the use of more sophisticated software development tools and techniques.

This E-CAM extended software development workshop (ESDW) will focus on intelligent HTC, with the primary goals being:

1. To help scientific developers interface their software with HTC middleware.
2. To benchmark, and ideally improve, the performance of HTC middleware as applications approach extreme scale.

This workshop will aim to produce four or more software modules related to intelligent HTC, and to submit them, with their documentation, to the E-CAM software module repository. These will include modules adding HTC support to existing computational chemistry codes, where the participants will bring the codes they are developing. They may also include modules adding new middleware or adding features to existing middleware that facilitate the use of HTC by the computational chemistry community. This workshop will involve training both in the general topic of designing software to interface with HTC libraries, and in the details of interfacing with specific middleware packages.

The range of use for intelligent HTC in scientific programs is broad. For example, intelligent HTC can be used to select and run many single-point electronic structure calculations in order to develop approximate potential energy surfaces. Even more examples can be found in the wide range of methods that require many trajectories, where each trajectory can be treated as a task, such as:

* rare events methods, like transition interface sampling, weighted ensemble, committor analysis, and variants of the Bennett-Chandler reactive flux method
* semiclassical methods, including the phase integration method and the semiclassical initial value representation
* adaptive sampling methods for Markov state model generation
* approaches such as nested sampling, which use many short trajectories to estimate partition functions

The challenge is that most developers of scientific software are not familiar with the way such packages can simplify their development process, and the packages that exist may not scale to exascale. This workshop will introduce scientific software developers to useful middleware packages, improve scaling, and provide an opportunity for scientific developers to add support for HTC to their codes.

Major topics that will be covered include:

* Concepts of HTC; how to structure code for HTC
* Accessing computational resources to use HTC
* Interfacing existing C/C++/Fortran code with Python libraries
* Specifics of interfacing with Celery/COMPSs
* Challenges in using existing middleware at extreme scale

[1] Celery: Distributed Task Queue. http://celeryproject.org, date accessed 14 August 2017.

[2] R.M. Badia et al. SoftwareX 3-4, 32 (2015).

[3] S. Plimpton. J. Comput. Phys. 117, 1 (1995).

[4] W.H. Miller. J. Chem. Phys. 105, 2942 (2001).

[5] J. Beutier et al. J. Chem. Phys. 141, 084102 (2014).

[6] Du et al. J. Chem. Phys. 108, 334 (1998).

[7] G.A. Huber and S. Kim. Biophys. J. 70, 97 (1996).

Share

State-of-the-Art Workshop: Large scale activated event simulations

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Running on powerful computers, large-scale molecular dynamics (MD) simulations are used routinely to simulate systems of millions of atoms providing crucial insights on the atomistic level of a variety of processes of interest in physics, materials science, chemistry and biology. For instance, MD simulations are extensively used to study the dynamics and interactions of proteins, understand the properties of solutions or investigate transport in and on solids. From a technological point of view, molecular dynamics simulations play an important role in many fields such as drug development, the discovery of new materials, oil extraction or energy production. Indeed, enormous amounts of data are produced every day by molecular dynamics simulations running on high performance computers around the world and one of the big challenges related to such simulations is to make sense of the data and obtain mechanistic understanding in terms of low-dimensional models that capture the crucial features of the processes under study. Another central challenge is related to the time scale problem often affecting molecular dynamics simulations. More specifically, despite the exponential increase in computing power witnessed during the last decades and the development of efficient molecular dynamics algorithms, many processes are characterized by typical time scales that are still far beyond the reach of current computational capabilities. Addressing such time scale problems and developing scientific software able to overcome them is one of the central goals of Work Package 1 (WP1-Classical Molecular Dynamics) of the E-CAM Project.

Three fundamental problems are intimately tied to the time scale problem of classical molecular dynamics simulation:

1) The calculation of the populations of metastable states of an equilibrium system. Such populations can be expressed in terms of free energies and hence this problem boils down to the efficient calculation of free energies.

2) The sampling of transition pathways between long-lived (meta)stable states and the calculation of reaction rate constants.

3) The extraction of useful mechanistic information from the simulation data and the construction of low-dimensional models that capture the essential features of the process under study. Such models serve as the basis for the definition of reaction coordinates that enable in-depth studies of the process at hand, e.g. by computing the free energy and kinetics.

The central goal of this workshop is to review new algorithmic developments that address the computational challenges mentioned above with a particular emphasis on implications for industrial applications. In particular, the workshop aims at identifying software modules that should be developed to make efficient and scalable algorithms available to the academic and industrial community. Another goal of the workshop is to identify specific collaboration projects with industrial partners. A dedicated half-day session will be organized specifically for this purpose. To establish the needs of the community and lay out possible directions for development, we will bring together a diverse group of people including software developers, users of HPC infrastructure and industrial researchers.

The proposed workshop is a follow-up of the first ECAM State-of-the-art Workshop of WP1, which took place in the summer of 2016 at the Lorentz Center in Leiden, The Netherlands. At this workshop, participants reviewed current rare event methods including path sampling, milestoning, metadynamics, Markov state modeling, diffusion maps, dimension reduction, reaction coordinate optimization, machine learning, and unsupervised cluster methods, and explored ways to improve these methods. Particular attention was devoted to the integration of popular MD packages such as Gromacs, NAMD, Charmm, Amber, ACEMD, MOIL, LAMMPS with enhanced analysis and advanced sampling tools including Plumed (a package for enhanced sampling and collective variable analysis), pyEmma, and MSMBuilder (packages for Markov sate model analysis).

Notwithstanding the great capabilities of existing methods and software, several challenges remain and will be discussed at the proposed workshop in Vienna:

– Extracting order parameters from molecular simulations to construct low dimensional models. This point is important because there is no straightforward recipe to reduce the dimensions to meaningful variables and progress in this area is urgently needed.

– Efficient Methods for sampling rare pathways. Here the goal is to create the molecular trajectory data using advanced sampling algorithms.

– Machine learning algorithms. Automatic analysis methods may offer new ways to guide simulations and construct reaction coordinates from molecular trajectories.

– Better ways to integrate simulations and experiments. It is important to connect the proposed computational methods to experimental probes and integrate experimental information into the analysis of computer simulation data.

More specifically, questions that will be addressed at the proposed workshop include:

1. How to obtain the best low dimension model for the process of interest?

2. How can we use machine learning to find collective variables and reaction coordinates?

3. When can reaction coordinates, which often constitute the slow variables of a process, be used to coarse-grain the dynamics? When not?

4. What if multiple transitions are important? Do we resort to kinetic networks or use multiple reaction coordinates? Should one identify a single (possibly complicated) reaction coordinate, or try to construct a Markov state model (MSM) using many metastable states?

5. When is it possible to reduce a complex problem to diffusion on a one dimensional free energy landscape, and when do we need a network Markov model?

6. How can experiments test reaction coordinate predictions? How do we connect to experiments?

7. How can extreme-scale computational resources be used efficiently to address these questions?

8. How can progress in these questions help to address problems of industrial interest?

Share

Path density for OpenPathSampling

Module path density implements path density calculations for the OpenPathSampling (OPS) package, including a generic multidimensional sparse histogram, and plotting functions for the two-dimensional case. Path density plots provide a way to visualize kinetic information obtained from path sampling, such as the mechanism of a rare event. In addition, the code in this module can also be used to visualize thermodynamic information such as free energy landscapes.

This module has been incorporated into the core of OPS, an open-source Python package for path sampling that wraps around other classical Molecular Dynamics (MD) codes [1]. An easy-to-read article on the use of path sampling methods to study rare events, and the role of the OPS package to performing these simulations can be found here.

At first glance, a typical path density plot may appear similar to a two-dimensional free energy landscape plot. They are both “heatmap”-type plots, plotting a two-dimensional histogram in some pair of collective variables. However, path density differs from free energy in several important respects:

  • A path density plot is histogrammed according to the number of paths, not the number of configurations. So if a cell is visited more than once during a path, it still only gets counted once.
  • A path density plot may interpolate across cells that the path jumps over. This is because it is assumed that the input must actually be continuous.

These differences can prevent metastable regions from overwhelming the transition regions in the plot. When looking at mechanisms, the path density is a more useful tool than the raw configurational probability.

Module documentation can be found here, including a link to the source code. This and other software modules for studying the thermodynamics and kinetics of rare events where recently documented in deliverable D1.2.: Classical MD E-CAM modules I, available here.

Motivation and exploitation

The path density is one of the most important tools for visualizing mechanisms, and is often one of the first things to analyze in order to draw scientific conclusions about the mechanism from transition path sampling simulations. This module was used to illustrate the differences between dynamics of the wild-type and oncogenic mutant forms of KRas, as part of one student’s master’s thesis and another student’s bachelor’s thesis at the University of Amsterdam. Results from those projects are currently in preparation for publication [2].

 

[1] Jan-Hendrik Prinz, David W.H. Swenson, Peter G. Bolhuis, and John D. Chodera. OpenPathSampling: A Python framework for path sampling simulations. I. Introduction and usage. In prep.
[2] Sander Roet, Ferry Hooft, Peter G. Bolhuis, David W.H. Swenson, and Jocelyne Vreede. Simulating the dynamics of oncogenic and wild-type KRas. In prep.

Share

A Conversation on Neural Networks, from Polymorph Recognition to Acceleration of Quantum Simulations

 

With Prof. Christoph Dellago (CD), University of Vienna, and Dr. Donal Mackernan (DM), University College Dublin.

 

Abstract

Recently there has been a dramatic increase in the use of machine learning in physics and chemistry, including its use to accelerate simulations of systems at an ab-initio level of accuracy, as well as for pattern recognition. It is now clear that these developments will significantly increase the impact of simulations on large scale systems requiring a quantum level of treatment, both for ground and excited states. These developments also lend themselves to simulations on massively parallel computing platforms, in many cases using classical simulation engines for quantum systems.

 

Continue reading…

Share