Dr. Francesco Fracchia
Host beneficiary: Scuola Normale Superiore, Pisa, Italy
Industrial partner: BiKi Technologies, Genova, Italy
About one quarter to one third of all proteins require metals to function but the description of metal ions in standard force fields is still quite primitive. In this collaborative project with BiKi technologies, a suitable parameterisation of the non-bonded interactions of the metal ion will be performed based on DFT calculations to reproduce the correct behaviour in the relevant ion configurations. The outcome of this task will be a novel and general approach to metalloprotein parameterization and a procedure for generating molecular mechanics parameters from quantum chemical potentials.
List of Tasks
List of Modules
Status: delivered. Description: This module performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size of the training set required, the module executes the combinatorial optimization that maximizes a dissimilarity score among the trial elements of the training set. The dissimilarity score is based on the mutual distances between the elements, his maximization assures high coverage of the domain of the configurations.
Status: delivered. Description: This module reads the configurations of a molecular system generated by GROMACS and prepares the input for the GRASP Sampling module. The module performs the following operations:
- It reads the configurations generate by the molecular dynamics.
- It calculates, for each configuration, the Euclidean distances of all atoms from the metal ion.
- It identifies the permutational equivalent atoms.
- It performs a Gaussian transformation of the distances.
- It calculates the variances of transformed distances.
- It selects the coordinates with the higher variances.
- It prepares the input for the GRASP Sampling module as a matrix including the transformed distances for all the configurations.
Status: delivered. Description: This module generates the input files of the clusters included in the training set for the software package Gaussian to produce the reference data for the fitting procedure of the force fields. The clusters are generated cutting the selected configuration. The module saturates, when necessary, the residues and prepares the input file for the calculation of the energy and the forces of the cluster generated in the Gaussian format.
Status: delivered. Description: This module performs a single-objective global optimization in a continuous domain using the metaheuristic algorithm Success-History based Adaptive Differential Evolution (SHADE). SHADE is a recent adaptive version of the differential evolution algorithm, a stochastic population-based derivative-free optimizer. The module in the fitting procedure is employed to optimize the regularization parameter and the non-linear parameters of the model of the force fields.
Weighted Linear Ridge Regression
Status: delivered. Description: This module solves the weighted linear ridge regression problem calculating the linear parameters of a model selected by the user that minimize the deviations of the predictions from the references of the data set. Therefore, it is a supervised learning tool that optimizes the linear parameters of a analytical expression in order to fit a data set. Each element of the data set can be weighted according to the relative importance or reliability attributed by the user. The regularization provides a protection from the overfitting, this inconvenient can occur if the flexibility of the model is too high in relation to the available data. Moreover, the module calculates the leave-one-out cross-validation error for the employed data set.
Force Field Parametrization of Metal Ions from Statistical Learning Techniques
Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, J. Chem. Theory Comput. 2018, 14, 255−273
Open access version
Metal-ion force field developed by E-CAM using novel Machine Learning procedure is now available for download
The simulation of metal ions in protein-water systems using machine learning: An E-CAM case study and conversation
GRASP Sampling – a module to build a representative data set for a fitting procedure
New article is out: “Force Field Parametrization of Metal Ions from Statistical Learning Techniques”