Metal-ion force field developed by E-CAM using novel Machine Learning procedure is now available for download

 

The database of the force fields developed by the SNS SMART group (SNS, Pisa, Italy), including the metal-ions force fields optimized within E-CAM using novel Machine Learning procedure (reported in a recent publication[1] and in a case study reported by E-CAM here), are now available for download at http://smart.sns.it/vmd_molecules/.

[1] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273 DOI: 10.1021/acs.jctc.7b00779

 

Share

The simulation of metal ions in protein-water systems using machine learning: An E-CAM case study and conversation

 

With Dr. Francesco Fracchia, Scuola Normale Superiore di Pisa

Interviewer: Dr. Donal Mackernan, University College Dublin

 

Abstract

One quarter to one third of all proteins require metals to function but the description of metal ions in standard force fields is still quite primitive. In this case study and interview an E-CAM project to develop a suitable parameterisation using machine learning is described.  The training scheme combines classical simulation with electronic structure calculations to produce a force field comprising standard classical force fields with additional terms for the metal ion-water and metal ion-protein interactions. The approach allows simulations to run as fast as standard molecular dynamics codes, and is suitable for efficient massive parallelism scale-up.

Continue reading…

Share

GRASP Sampling – a module to build a representative data set for a fitting procedure

GRASP_sampling performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size (N) of the training set required, the module executes the combinatorial optimization that maximizes the following dissimilarity score (DS) among the elements of the training set:

../../../_images/dissimilarity_score.png

In this formula, the j-th configuration in the sum is the j-th nearest one to the l-th configuration and dij is the Euclidean distance between the l-th and j-th configurations. M is the number of the nearest configurations considered in the score. The exponential weight makes the score near independent from the particular value of M, if it is larger than 4-6.

The combinatorial optimization that maximizes the dissimilarity score is performed using the greedy randomized adaptive search procedure[1]  (GRASP) algorithm. A stratified sampling can be performed without a combinatorial optimization using classical statistical techniques (for example Latin hypercube sampling), the GRASP sampling becomes useful when the selection is restricted to a predeterminated set of configurations, generated or sampled with specific internal constrains. This is the case of the molecular configurations generated in a molecular dynamics simulation.

The complete module documentation, including a link to the source code, can be found in our repository here

Motivation and exploitation

The application of the GRASP algorithm to perform a stratified sampling is described in a recent publication [2] by the E-CAM partners at Scuola Normale Superiore (SNS), that we previously reported here.

The motivation behind this software module is the pilot project with industry “Quantum Mechanical Parameterisation of Metal Ions in Proteins” sustained by an E-CAM postdoctoral researcher from SNS.

 

[1] Feo, T. A.; Resende, M. G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109−133

[2] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273

Share

New article is out: “Force Field Parametrization of Metal Ions from Statistical Learning Techniques”

 

This paper from E-CAM partners working in Scuola Normale Superiore (Pisa, Italy) describes a novel statistical procedure, developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The paper is open access and can be downloaded directly from ACS’s page at http://pubs.acs.org/doi/10.1021/acs.jctc.7b00779.

This work was performed in the context of the E-CAM pilot project on Quantum Mechanical Parameterisation of Metal Ions in Proteins, which is a collaboration with BiKi Technologies. The list of software modules associated to the pilot project (and this publication) can be found here.

Article

Title: Force Field Parametrization of Metal Ions from Statistical Learning Techniques

Authors: Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone

Abstract: A novel statistical procedure has been developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The criterion for the optimization is the minimization of the deviations from ab initio forces and energies calculated for model systems. The method exploits the combination of the linear ridge regression and the cross-validation techniques with the di˙erential evolution algorithm. Wide freedom in the choice of the functional form of the force fields is allowed since both linear and non-linear parameters can be optimized. In order to maximize the information content of the data employed in the fitting procedure, the composition of the training set is entrusted to a combinatorial optimization algorithm which maximizes the dissimilarity of the included instances. The methodology has been validated using the force field parametrization of five metal ions (Zn2+, Ni2+, Mg2+, Ca2+, and Na+) in water as test cases.

 

Share