GRASP Sampling – a module to build a representative data set for a fitting procedure

GRASP_sampling performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size (N) of the training set required, the module executes the combinatorial optimization that maximizes the following dissimilarity score (DS) among the elements of the training set:

../../../_images/dissimilarity_score.png

In this formula, the j-th configuration in the sum is the j-th nearest one to the l-th configuration and dij is the Euclidean distance between the l-th and j-th configurations. M is the number of the nearest configurations considered in the score. The exponential weight makes the score near independent from the particular value of M, if it is larger than 4-6.

The combinatorial optimization that maximizes the dissimilarity score is performed using the greedy randomized adaptive search procedure[1]  (GRASP) algorithm. A stratified sampling can be performed without a combinatorial optimization using classical statistical techniques (for example Latin hypercube sampling), the GRASP sampling becomes useful when the selection is restricted to a predeterminated set of configurations, generated or sampled with specific internal constrains. This is the case of the molecular configurations generated in a molecular dynamics simulation.

The complete module documentation, including a link to the source code, can be found in our repository here

Motivation and exploitation

The application of the GRASP algorithm to perform a stratified sampling is described in a recent publication [2] by the E-CAM partners at Scuola Normale Superiore (SNS), that we previously reported here.

The motivation behind this software module is the pilot project with industry “Quantum Mechanical Parameterisation of Metal Ions in Proteins” sustained by an E-CAM postdoctoral researcher from SNS.

 

[1] Feo, T. A.; Resende, M. G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109−133

[2] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273

Share

From Rational Design of Molecular Biosensors to Patent and potential Start-up

 

Dr. Donal Mackernan, University College Dublin

Abstract

The power of advanced simulation combined with statistical theory , experimental know-how and high performance computing is used to design a protein based molecular switch sensor with remarkable sensitivity and significant industry potential. The sensor technology has applications across commercial markets including diagnostics, immuno-chemistry, and therapeutics.

 

Continue reading…

Share

New publication using the GC-AdResS molecular dynamics technique

 

The publication “Probing spatial locality in ionic liquids with the grand canonical adaptive resolution molecular dynamics technique (GC-AdResS) by the Theoretical and Mathematical Physics in Molecular Simulation group of the Freie Universität Berlin, lead by Prof.Luigi Delle Site, E-CAM partner, describes the use of the GC-AdResS molecular dynamics technique to test the spatial locality of the ionic liquid 1-ethyl 3-methyl imidazolium chloride liquid. The main aspect of GC-AdResS is the possibility to couple two simulation boxes together and combine the advantages of classical atomistic simulations with those from coarse gained simulations.

The publication post-print version is open access and can be downloaded directly from the Zenodo repository here. The publisher AIP version can be found at http://aip.scitation.org/doi/10.1063/1.5009066.

E-CAM currently runs a pilot project on the development of the GC-AdResS scheme and one of its goals is to develop a library or recipe with which GC-AdResS can be implemented in any MD Code. The current focus is to adjust the implemented version of GC-AdResS in GROMACS. The long-term goal of this project is to promote and stimulate the community to use it as a tool for multiscale simulations and analysis. More information about this pilot project can be found here.

Article

Title: Probing spatial locality in ionic liquids with the grand canonical adaptive resolution molecular dynamics technique

Authors:  B. Shadrack Jabes, C. Krekeler, R. Klein and L. Delle Site

Abstract: We employ the Grand Canonical Adaptive Resolution Simulation (GC-AdResS) molecular dynamics technique to test the spatial locality of the 1-ethyl 3-methyl imidazolium chloride liquid. In GC-AdResS, atomistic details are kept only in an open sub-region of the system while the environment is treated at coarse-grained level; thus, if spatial quantities calculated in such a sub-region agree with the equivalent quantities calculated in a full atomistic simulation, then the atomistic degrees of freedom outside the sub-region play a negligible role. The size of the sub-region fixes the degree of spatial locality of a certain quantity. We show that even for sub-regions whose radius corresponds to the size of a few molecules, spatial properties are reasonably reproduced thus suggesting a higher degree of spatial locality, a hypothesis put forward also by other researchers and that seems to play an important role for the characterization of fundamental properties of a large class of ionic liquids.

The Journal of Chemical Physics 148, 193804 (2018)
Share

E-CAM program of events 2018 is out

Check out our program of events for this year, running from April 2018 to February 2019:

E-CAM Events 2018

See the workshop details to learn how to apply. E-CAM events are part of the annual CECAM flagship program, and are hosted at the different CECAM Nodes locations.

E-CAM runs three types of events every year:

  • Scoping workshops (SCOWs)
  • State-of-the-art workshop (SAWs)
  • Extended Software Development Workshops (ESDWs)

For their definition see here. If you require any further information contact us at info@e-cam2020.eu

 

Share

New article is out: “Force Field Parametrization of Metal Ions from Statistical Learning Techniques”

 

This paper from E-CAM partners working in Scuola Normale Superiore (Pisa, Italy) describes a novel statistical procedure, developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The paper is open access and can be downloaded directly from ACS’s page at http://pubs.acs.org/doi/10.1021/acs.jctc.7b00779.

This work was performed in the context of the E-CAM pilot project on Quantum Mechanical Parameterisation of Metal Ions in Proteins, which is a collaboration with BiKi Technologies. The list of software modules associated to the pilot project (and this publication) can be found here.

Article

Title: Force Field Parametrization of Metal Ions from Statistical Learning Techniques

Authors: Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone

Abstract: A novel statistical procedure has been developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The criterion for the optimization is the minimization of the deviations from ab initio forces and energies calculated for model systems. The method exploits the combination of the linear ridge regression and the cross-validation techniques with the di˙erential evolution algorithm. Wide freedom in the choice of the functional form of the force fields is allowed since both linear and non-linear parameters can be optimized. In order to maximize the information content of the data employed in the fitting procedure, the composition of the training set is entrusted to a combinatorial optimization algorithm which maximizes the dissimilarity of the included instances. The methodology has been validated using the force field parametrization of five metal ions (Zn2+, Ni2+, Mg2+, Ca2+, and Na+) in water as test cases.

 

Share

Geomoltools: A set of software modules to easily manipulate molecular geometries

Geomoltools is a set of eight pre- and post-treatment Fortran codes that can be used to easily manipulate molecular geometries, allowing to minimize the average energy obtained for a range of internuclear distances for the dimers of each element, and decrease the computational cost of a DFT calculation.

The set of codes are:

  • mol2xyz: converts a .mol file into an ordered .xyz file
  • pastemol: joins two .xyz files
  • movemol: translates and aligns the molecule with some predefined axes
  • stackmol: generates (manually or randomly) different stacking arrangements between two molecules
  • geodiff: compares the internal coordinates of two molecules
  • xyz2zmt_s: converts the cartesian coordinates contained in a .xyz file into Z-matrix (2 possible formats)
  • zmt2xyz_s: converts a Z-matrix (from 2 possible formats) into cartesian coordinates
  • ucubcellgen: calculates the vectors of a unit cell given some atomic coordinates.

Modules source codes can be found here.  For a detailed explanation of the main programs, please have a look to this file. A complete tutorial on how to use the different codes from the package Geomoltools in order to manipulate (rotate, translate, join, pack, convert, etc.) molecular geometries, can be found at this address.

Motivation and exploitation

These modules have been used to study the stacking arrangements of acceptor:donor molecules for organic photovolatics polymers by high-throughput computation with the SIESTA code. This set of codes are available under the GNU General Public License (GPL) version 2.

Share

Issue 6 – October 2017

 

E-CAM Newsletter of October 2017

Share

Path density for OpenPathSampling

Module path density implements path density calculations for the OpenPathSampling (OPS) package, including a generic multidimensional sparse histogram, and plotting functions for the two-dimensional case. Path density plots provide a way to visualize kinetic information obtained from path sampling, such as the mechanism of a rare event. In addition, the code in this module can also be used to visualize thermodynamic information such as free energy landscapes.

This module has been incorporated into the core of OPS, an open-source Python package for path sampling that wraps around other classical Molecular Dynamics (MD) codes [1]. An easy-to-read article on the use of path sampling methods to study rare events, and the role of the OPS package to performing these simulations can be found here.

At first glance, a typical path density plot may appear similar to a two-dimensional free energy landscape plot. They are both “heatmap”-type plots, plotting a two-dimensional histogram in some pair of collective variables. However, path density differs from free energy in several important respects:

  • A path density plot is histogrammed according to the number of paths, not the number of configurations. So if a cell is visited more than once during a path, it still only gets counted once.
  • A path density plot may interpolate across cells that the path jumps over. This is because it is assumed that the input must actually be continuous.

These differences can prevent metastable regions from overwhelming the transition regions in the plot. When looking at mechanisms, the path density is a more useful tool than the raw configurational probability.

Module documentation can be found here, including a link to the source code. This and other software modules for studying the thermodynamics and kinetics of rare events where recently documented in deliverable D1.2.: Classical MD E-CAM modules I, available here.

Motivation and exploitation

The path density is one of the most important tools for visualizing mechanisms, and is often one of the first things to analyze in order to draw scientific conclusions about the mechanism from transition path sampling simulations. This module was used to illustrate the differences between dynamics of the wild-type and oncogenic mutant forms of KRas, as part of one student’s master’s thesis and another student’s bachelor’s thesis at the University of Amsterdam. Results from those projects are currently in preparation for publication [2].

 

[1] Jan-Hendrik Prinz, David W.H. Swenson, Peter G. Bolhuis, and John D. Chodera. OpenPathSampling: A Python framework for path sampling simulations. I. Introduction and usage. In prep.
[2] Sander Roet, Ferry Hooft, Peter G. Bolhuis, David W.H. Swenson, and Jocelyne Vreede. Simulating the dynamics of oncogenic and wild-type KRas. In prep.

Share

Second-Order Differencing Scheme

This module, SodLib, provides exact wavefunction propagation using the second-order differencing (SOD) integrator scheme to solve the time-dependent Schrödinger equation as described by Leforestier et al, J. Comp Phys, 94, 59-80, 1991. Within this scheme the time interval is determined through dividing hbar by the eigenvalue of the Hamiltonian operator with the largest absolute value. This routine has been implemented and tested as an added functionality within the Quantics software package available through CCPForge.

Quantics is a package to study chemical reactions of molecules whose main developer (G. Worth, University College London) is a member of E-CAM’s WP3 – Quantum Dynamics. It incorporates a variety of quantum dynamical methods joined by the fact that the state system is usually described via wavefunctions (containing the quantum analogue of the information given by positions and velocities for classical atoms). It is increasingly used by the computational chemistry community for scientific applications. Work is on-going in E-CAM to improve its scalability (see E-CAM deliverable D7.2 ) and add new functionalities in view of applications to study materials and light harvesting complexes.

Module documentation can be found here, including a link to the source code.

Practical application and exploitation of the code

The module is currently being used in a Phd thesis and the results of this application will provide benchmarks for a model describing proton-transfer in a condensed phase system.

 

Share

A Conversation on Neural Networks, from Polymorph Recognition to Acceleration of Quantum Simulations

 

With Prof. Christoph Dellago (CD), University of Vienna, and Dr. Donal Mackernan (DM), University College Dublin.

 

Abstract

Recently there has been a dramatic increase in the use of machine learning in physics and chemistry, including its use to accelerate simulations of systems at an ab-initio level of accuracy, as well as for pattern recognition. It is now clear that these developments will significantly increase the impact of simulations on large scale systems requiring a quantum level of treatment, both for ground and excited states. These developments also lend themselves to simulations on massively parallel computing platforms, in many cases using classical simulation engines for quantum systems.

 

Continue reading…

Share