July module of the month: Minimal distance segment to segment with Karush-Kuhn-Tucker conditions

 

Description

The module minDist2segments_KKT returns the minimal distance between two line segments. It uses the Karush-Kuhn-Tucker conditions (KKT) for the minimization under constraints.

Practical application

We would use the present module to avoid topology violation in an entangled polymer system. To preserve topology in a system of entangled polymers we need to determine the minimal distance between two bonds. Once is done we can apply either a soft potential either a hard potential to avoid a crossing of the two bonds. Here, we propose to determine the minimal distance between two segments with the help of the Karush-Kuhn-Tucker conditions.

This module is a part of an E-CAM pilot project at the ENS Lyon, focused on the implementation of contact joint to resolve excluded volume constraints

Background information

A detailed derivation of the minimal distance between two segments using the Karush-Kuhn-Tucker conditions is available at  https://gitlab.e-cam2020.eu:10443/carrivain/mindist2segments_kkt/-/blob/master/minDist2segments_KKT.pdf

This module is used by other ongoing work, such as module velocities_resolve_EV, that resolves the excluded volume constraint  with a velocity formulation.

Source code

The source code and more information can be found at minDist2segments_KKT GitLab repository.

Share

E-CAM High Throughput Computing Library

This module is the first in a sequence that will form the overall capabilities of the E-CAM High Throughout Computing (HTC) library. In particular this module deals with creating a set of decorators to wrap around the Dask-Jobqueue Python library, which aspires to make the development time cost of leveraging it lower for our use cases.

The initial motivation for this library is driven by the ensemble-type calculations that are required in many scientific fields, and in particular in the materials science domain in which the E-CAM Centre of Excellence operates.

One specific application for this module is the study of “rare events” in theoretical and computational chemistry, a particularly relevant topic for E-CAM . Many problems in biological chemistry, materials science, and other fields involve events that only spontaneously occur after a millisecond or longer (for example, biomolecular conformational changes, or nucleation processes). That means that around 1012 time steps would be needed to see a single millisecond-scale event.

Modern supercomputers are beginning to make it possible to obtain trajectories long enough to observe some of these processes, but to fully characterize a transition with proper statistics, many examples are needed. In order to obtain many examples the same application must be run many thousands of times with varying inputs. To manage this kind of computation a task scheduling high throughput computing (HTC) library is needed. The main elements of the mentioned scheduling library are: task definition, task scheduling and task execution.

While traditionally an HTC workload is looked down upon in the HPC space, the scientific use case for extreme-scale resources exists and algorithms that require a coordinated approach make efficient libraries that implement this approach increasingly important in the HPC space. The 5 Petaflop booster technology of JURECA is an interesting concept with respect to this approach since the offloading approach of heavy computation marries perfectly to the concept outlined here.

Module documentation at https://e-cam.readthedocs.io/en/latest/Classical-MD-Modules/modules/HTC/decorators/readme.html

Share

New publication is out: “Atomistic insight into the kinetic pathways for Watson–Crick to Hoogsteen transitions in DNA”

Title: Atomistic insight into the kinetic pathways for Watson-Crick to Hoogsteen transitions in DNA

Authors: Vreede J, Pérez de Alba Ortíz A, Bolhuis PG, and Swenson DWH

Nucleic Acids Research 2019, Vol. 47, No. 21, 11069–11076, DOI: 10.1093/nar/gkz837 (open access)

Synopsis

DNA predominantly contains Watson–Crick (WC) base pairs, but a non-negligible fraction of base pairs are in the Hoogsteen (HG) hydrogen bonding motif at any time. In the HG motif, the purine is “upside down” compared to the WC motif. Two classes of mechanism have been proposed for the transition between these motifs: one where the base pair stays inside the confines of the helical backbone, and one where one base flips outside of the helical backbone before returning in the “upside down” HG conformation. The transitions between WC and HG may play a role in recognition and replication, but are difficult to investigate because they occur quickly, but only rarely. To gain insight into the mechanisms for this process, researchers performed transition path sampling simulations on a model nucleotide sequence in which an adenine-thymine base pair changes from WC to HG, and found that the outside transition was strongly preferred. Simulated rates and free energy differences agree with experiments, the simulations provide highly detailed insights into the mechanisms of this process.

Share

Pyscal- A python module for structural analysis of atomic environments

Description

pyscal is a python module for the calculation of local atomic structural environments including Steinhardt’s bond orientational order parameters[1] during post-processing of atomistic simulation data. The core functionality of pyscal is written in C++ with python wrappers using pybind11 which allows for fast calculations and easy extensions in python.

Practical Applications

Steinhardt’s order parameters are widely used for the identification of crystal structures [3]. They are also used to distinguish if an atom is in a solid or liquid environment [4]. pyscal is inspired by the BondOrderAnalysis code, but has since incorporated many additional features and modifications. The pyscal module includes the following functionalities:

  • calculation of Steinhardt’s order parameters and their averaged version [2].
  • links with the Voro++ code, for the calculation of Steinhardt parameters weighted using the face areas of Voronoi polyhedra [3].
  • classification of atoms as solid or liquid [4].
  • clustering of particles based on a user defined property.
  • methods for calculating radial distribution functions, Voronoi volumes of particles, number of vertices and face area of Voronoi polyhedra, and coordination numbers.

Background information

See the application documentation for full details. A paper about pyscal is also available in Ref. [5].

The utilisation of Dask within the project came about as a result of the E-CAM High Throughput Computing ESDW held in Turin in 2018 and 2019.

The software module was developed by Sarath Menon, Grisell Díaz Leines and Jutta Rogal, and is under a GNU General Public License v3.0.

References

[1] Steinhardt, P. J., Nelson, D. R., & Ronchetti, M. (1983). Physical Review B, 28.

[2] Lechner, W., & Dellago, C. (2008). The Journal of Chemical Physics, 129.

[3] (12) Mickel, W., Kapfer, S. C., Schröder-Turk, G. E., & Mecke, K. (2013). The Journal of Chemical Physics, 138.

[4] (12) Auer, S., & Frenkel, D. (2005). Advances in Polymer Science, 173.

[5] Menon, S., Díaz Leines, G., & Rogal, J.(2019). pyscal: A python module for structural analysis of atomic environments. Journal of Open Source Software, 4(43), 1824

Share

PRACE/E-CAM successful collaboration produces task scheduling library for optimising time-scale molecular dynamics simulations

Challenge

E-CAM is interested in the challenge of bridging up timescales. To study molecular dynamics with atomistic detail, timesteps must be used on the order of a femto-second. Many problems in biological chemistry, materials science, and other ends involve events that only spontaneously occur after a millisecond or longer (for example, biomolecular conformational changes, or nucleation processes). That means that around 1012 time steps would be needed to see a single millisecond-scale event. This is the problem of “rare events” in theoretical and computational chemistry. Modern supercomputers are beginning to make it possible to obtain trajectories long enough to observe some of these processes, but to fully characterize a transition with proper statistics, many examples are needed. And in order to obtain many examples, the same application must be run thousands of times with varying inputs. To manage this kind of computation, a task scheduling library is needed

Solution and benefits

The development of a python library, in collaboration with PRACE. This library builds on top of the scalable analytics framework Dask and enables it to resiliently manage multi-node and multiarchitecture environments. This offers exciting possibilities in the areas of interactive supercomputing and burst supercomputing. A white paper focused on the library was written in collaboration with PRACE and is available here.

The main elements of the mentioned scheduling library are: task de definition, a task scheduling (handled in Python) and task execution (facilitated by the MPI layer). While traditionally an HTC workload is looked down upon in the HPC space, the scientific use case for extreme-scale resources exists and algorithms that require a coordinated approach make efficient libraries that implement this approach increasingly important in the HPC space. The 5 Peta op booster technology of JURECA is an interesting concept with respect to this approach since the offloading approach of heavy computation marries perfectly to the concept outlined here.

Reference

Alan O’Cais, David Swenson, Mariusz Uchronski, & Adam Wlodarczyk. (2019, August 14). Task Scheduling Library for Optimising Time-Scale Molecular Dynamics Simulations. Zenodo. http://doi.org/10.5281/zenodo.3527643

Share

Inverse Molecular Design & Inference: building a Molecular Foundry

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The overarching theme of this proposed E-CAM Transverse Extended Software Development Workshop is the design and control of molecular machines including sensors, enzymes, therapeutics, and transporters built as fusion proteins or nanocarrier-protein complexes, and in particular, the software development and interfacing that this entails. Several immuno-diagnostic companies and molecular biology experimental groups have expressed a strong interest in the projects at the core of this proposal. The proposed ESDW is transverse as it entails the use of methodologies from two E-CAM Scientific Workpackages: WP1 (Advanced MD/rare-events methods) and WP4 (Mesoscale/Multiscale simulation).

Fusion proteins are sets of two or more protein modules linked together where the underlying genetic codes of each module and the fusion protein itself are known or can be easily inferred. The fusion protein typically retains the functions of its components, and in some cases gains additional functions. They occur in nature, but also can be made artificially using genetic engineering and biotechnology- and used for a wide variety of settings ranging from unimolecular FRET sensors, novel immuno-based cancer drugs, enzymes [1,2] and energy conversion (for example efficient generation of alcohol from cellulose) [3,4]. Fusion proteins can be expressed using genetic engineering in cell lines, and purified for in-vitro use using biotechnology. Much of the design work is focused on how different modules are optimally linked or fused together via suitable peptides, rather than on internal changes of modules. Optimizing such designs experimentally can be done through for example random mutations, but a more controlled approach based on underlying molecular mechanisms is desirable, for which a pragmatic multiscale approach is ideally suited combining bioinformatics and homology, coarse-graining, detailed MD and rare-event based methods, and machine learning. The figure on the front of this proposal is a representative example of a fusion protein sensor designed to bind to a specific RNA nucleic acid sub-sequence, which causes an optimized hinge-like protein to close and in the process bring two fluorescence proteins together allowing the binding event to be observed optically through FRET microscopy.

Nanocarriers (NC) are promising tools for cancer immunotherapy and other diagnostic and therapeutic applications. NCs can be decorated on their surface with molecules that facilitate target-specific antigen delivery to certain antigen-presenting cell types or tumor cells. However, the target cell-specific uptake of nano-vaccines is highly dependent on the modifications of the NC itself. One of these is the formation of a protein corona [5] around NC after in vivo administration. Appropriate targeting of NC can be affected by unintended interactions of the NC surface with components of blood plasma and/or with cell surface structures that are unrelated to the specific targeting structure. The protein corona around NC may affect their organ-specific or cell type-specific trafficking as well as endocytosis and/or functional properties of the NC. Most importantly, the protein corona has been shown to interfere with targeting moieties used to induce receptor-mediated uptake of the NC, both inhibiting and enhancing internalization by specific cell types [5]. Moreover, the protein corona is taken up by the target cell, which may alter their function. Therefore, tailoring the surface properties of the NC to facilitate the adsorption of specific proteins and control the structure of the corona can help to significantly improve their performance. Modification of surface properties, e.g. via grafting olygomers, is also known to affect the preferred orientation of adsorbed proteins and, therefore, their functionality [6]. The molecular design would include the selection of appropriate NC coating and the type of antibody to optimize the NC uptake.

Mesoscale simulation is required to understand the thermodynamics and kinetics of protein adsorption on the NCs with engineered surfaces [7] and to achieve the desired structure with preferred adsorption of the selected antigen. However, the aforementioned issues often require biological and chemical accuracy that typical mesoscale models cannot achieve unless buttressed by accurate simulations at an atomistic/molecular level, rare-event methods and machine learning.

A pragmatic approach towards the enhancement of fusion proteins and NC’s is as follows.

(i) Molecular designs are initially developed and optimized as simple CG models and include the use of information theory and machine learning.

(ii) The solution of the inverse problem of building the fusion protein or the NC-protein complex to match the design requires a multiscale approach combining mesoscale modeling, molecular dynamics, rare-event methods, machine learning, homology, mutation, solvent conditions.

(iii) Iterate steps (i) and (ii) to optimize the design, and in the process collect data for machine learning driven design.

(iv) Final validation using detailed MD, rare-event methods and HPC

The ESDW we plan will over the course of two 5 day meetings with several intervening months produce multiple software modules including the following.
(a) C/C++/Modern Fortran or python based codes to build and optimize simple CG models of fusion proteins or NC-protein complexes using information theory and machine learning.

(b) Semi-automated pipelines to solve the inverse problem of building the fusion protein or the NC to match the design. This will involve interfacing with md/ mesoscale engines such as LAMMPS, Gromacs, OPENMM, EXpresso, rare-event based methods such as PLUMED, and bioinformatics code such as I-TASSER, INTFOLD.

(c) Particle insertion/deletion methods for alchemistry – mutation of amino acids, changes in the solvent and associated changes in free energy properties.

(d) Codes to add corrections to coarse-grained models (bead models/martini) using detailed atomistic data (e.g. potential of mean force for key order parameters, structure factors etc) or experimental data where available.

While this is an ambitious plan, it is worth pointing out that a similar integrated approach to protein development was already made by the lab of John Chodera [8]. While it did not include the focus on fusion proteins or NC-protein complexes or incorporate systematically coarse-graining, it demonstrates both the feasibility of what we propose here and how to achieve practical solutions. Other ideas of a systematic approach to molecular design using MD simulation have been also proposed recently [9,10].

 

References

[1] H. Yang et al, The promises and challenges of fusion constructs in protein biochemistry and enzymology, Appl Microbiol Biotechnol (2016)
[2] Bochicchio, Anna et al, Designing the Sniper: Improving Targeted Human Cytolytic Fusion Proteins for Anti-Cancer Therapy via Molecular Simulation, Biomedicines, 5(1),9 (2017)
[3] Y. Fujita et al, Direct and Efficient Production of Ethanol from Cellulosic Material with a Yeast Strain Displaying Cellulolytic Enzymes, Appl Environ Microbiol. 68(10): 5136–5141 (2002)
[4] M. Gunnoo et al, Nanoscale Engineering of Designer Cellulosomes, dv Mater. 28(27):5619-4 (2016)
[5] M. Bros et al. The Protein Corona as a Confounding Variable of Nanoparticle-Mediated Targeted Vaccine Delivery, Front. Immunol. 9, 1760 (2018).
[6] I. Lieberwirth et al. The Role of the Protein Corona in the Uptake Process of Nanoparticles, 24, Supplement S1, Proceedings of Microscopy & Microanalysis (2018)
[7] H Lopez et al. Multiscale Modelling of Bionano Interface, Adv. Exp. Med. Biol. 947, 173-206 (2017)
[8] DL. Parton et al Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale. PLoS Comput Biol 12(6): e1004728, (2016)
[9] PV. Komarov et al. A new concept for molecular engineering of artificial enzymes: a multiscale simulation, Soft Matter 12, 689-704 (2016)
[10] BA. Thurston et al. Machine learning and molecular design of self-assembling -conjugated oligopeptides, Mol. Sim. 44, 930-945 (2018)
[11] D. Carroll. Genome Engineering with Targetable Nucleases, Annu. Rev. Biochem. 83:409–39 (2014)

Share

Integrating LAMMPS with OpenPathSampling

This module shows how LAMMPS can be used as Molecular Dynamic (MD) engine in OpenPathSampling (OPS) and it also provide a benchmark for the impact of OPS overhead over the MD engine.

Practical application and exploitation of the code

OpenPathSampling uses OpenMM as default engine for calculating the sampled trajectories. Other engines as GROMACS and LAMMPS can be used (despite not yet available in the official release) allowing to exploit different computer architectures like hybrid CPU-GPU and to simulate more complex problems.

In this module we present the source code for the integration of OPS with LAMMPS as well as a benchmark for of a simple test case to show the impact on the performance due to OPS overhead.

Software documentation and link to the source code can be found in our E-CAM software Library here.

Share

ESDW: Topics in Classical MD

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Classical molecular dynamics (MD) is a broad field, with many domains of expertise. Those specialist domains include topics like transition path sampling (which harvests many examples of a process in order to study it at a statistical level [1]), metadynamics (which runs a trajectory with modified dynamics that enhance sampling, and from which free energy profiles can be constructed [2]), as well as various topics focused on the underlying dynamics, either by providing better representations of the interactions between atoms (e.g., force fields [3] or neural network potentials [4]) or by changing the way the dynamics are performed (e.g., integrators [5]).

Frequently, experts in one domain are not experienced with the software of other domains. This workshop aims to combine both depth, by extending domain-specific software, and breadth, by providing participants an opportunity to learn about software from other domains. As an extended software development workshop (ESDW), a key component of the workshop will be the development of modules that extend existing software packages. Ideally, some modules may connect multiple domain-specific packages.

Topics at this workshop will include using and extending modern MD software in the domains of:

* advanced path sampling methods (and the software package OpenPathSampling)
* metadynamics and the calculation of collective variables (and the software package PLUMED)
* machine learning for molecular dynamics simulatons (including local structure recognition and representation of potential energy surfaces)

In addition, this workshop will feature an emphasis on performance testing and benchmarking software, with particular focus on high performance computing. This subject is relevant to all specialist domains.

By combining introductions to software from different specialist fields with an opportunity to extend domain-specific software, this workshop is intended to provide opportunities for cross-pollination between domains that often develop independently.

References

[1] Bolhuis, P.G. and Dellago, C. Trajectory‐Based Rare Event Simulations. Reviews in Computational Chemistry, 27, p. 111 (2010).
[2] A. Laio and F.L. Gervasio. Rep. Prog. Phys. 71, 126601 (2008).
[3] J.A. Maier, C. Martinez, K. Kasavajhala, L. Wickstrom, K.E. Hauser, and C. Simmerling. J. Chem. Theory. Comput. 11, 3696 (2015).
[4] T. Morawietz, A. Singraber, C. Dellago, and J. Behler. Proc. Natl. Acad. Sci USA, 113, 8368 (2016).
[5] B. Leimkuhler and C. Matthews. Appl. Math. Res. Express, 2013, 34 (2013).

Share

E-CAM related work labeled as “Excellent Science” by the EC Innovation Radar Initiative

The Innovation Radar aims to identify high-potential innovations and innovators. It is an important source of actionable intelligence on innovations emerging from research and innovation projects funded through European Union programmes.

E-CAM is associated to the following Innovations (Innovation topic: excellence science):

    1. Improved Simulation Software Packages for Molecular Dynamics (see link)
    2. Improved software modules for Meso– and multi–scale modelling (see link)

Related to the work of our E-CAM funded Postdoctoral researchers supervised by scientists in the team, working on:

  • Development of the OpenPathSampling package to study rare events  (Universiteit van Amsterdam). Link1
  • Implementation of GPU version of DL_MESO_DPD (Hartree Centre (STFC)). Link
  • Development of polarizable mesoscale model for DL_MESO_DPD (Hartree Centre (STFC)). Link
  • Development of the GC-AdResS scheme (Freie Universitaet Berlin). Link

  • Implementation of hierarchical strategy on ESPResSO++ (Max Plank Institute for Polymer Research, Mainz). Link
Share

Scientific Report from State-of-the-Art Workshop “Large Scale activated event simulations” is available on our website

The workshop scientific report from the E-CAM State-of-the-Art Workshop Large scale activated event simulations that took place on the 1-3 October 2018 in the CECAM-AT Node (Austria), is now available for consultation and download on our website under this link.

Short Description:

The State-of-the-Art workshop in the E-CAM classical molecular simulation work-package (WP1)  brought together 40 participants including scientists from non-academic research centres, to discuss computational approaches capable of addressing time scale problems in complex systems in materials science and biophysics. Scientific discussions at the workshop centred around three fundamental computational challenges closely related to the time scale problem of classical MD simulation: (1)  The calculation of the populations of metastable states of an equilibrium system; (2) The sampling of transition pathways between long-lived (meta)stable states and the calculation of reaction rate constants; and (3) The extraction of useful mechanistic information from the simulation data and the construction of low-dimensional models that capture the essential features of the process under study. The main outcomes from each discussion are described in the workshop report.

Two open discussion sessions revolved on efficient path sampling methods and the identification of reaction coordinates; and how machine learning approaches can be used to make progress in this area. Another important goal of the workshop was to debate about how to facilitate the use of simulation and modelling in industrial settings, with the workshop participants with industrial experience emphasing the importance of detailed project management and, in particular, the need to have very clear agreements about intellectual property rights.

Other scientific reports from State of the Art and Scoping workshops can be hound here:  https://www.e-cam2020.eu/scientific-reports/.

Share