GRASP Sampling – a module to build a representative data set for a fitting procedure

GRASP_sampling performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size (N) of the training set required, the module executes the combinatorial optimization that maximizes the following dissimilarity score (DS) among the elements of the training set:

../../../_images/dissimilarity_score.png

In this formula, the j-th configuration in the sum is the j-th nearest one to the l-th configuration and dij is the Euclidean distance between the l-th and j-th configurations. M is the number of the nearest configurations considered in the score. The exponential weight makes the score near independent from the particular value of M, if it is larger than 4-6.

The combinatorial optimization that maximizes the dissimilarity score is performed using the greedy randomized adaptive search procedure[1]  (GRASP) algorithm. A stratified sampling can be performed without a combinatorial optimization using classical statistical techniques (for example Latin hypercube sampling), the GRASP sampling becomes useful when the selection is restricted to a predeterminated set of configurations, generated or sampled with specific internal constrains. This is the case of the molecular configurations generated in a molecular dynamics simulation.

The complete module documentation, including a link to the source code, can be found in our repository here

Motivation and exploitation

The application of the GRASP algorithm to perform a stratified sampling is described in a recent publication [2] by the E-CAM partners at Scuola Normale Superiore (SNS), that we previously reported here.

The motivation behind this software module is the pilot project with industry “Quantum Mechanical Parameterisation of Metal Ions in Proteins” sustained by an E-CAM postdoctoral researcher from SNS.

 

[1] Feo, T. A.; Resende, M. G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109−133

[2] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273

Share

Geomoltools: A set of software modules to easily manipulate molecular geometries

Geomoltools is a set of eight pre- and post-treatment Fortran codes that can be used to easily manipulate molecular geometries, allowing to minimize the average energy obtained for a range of internuclear distances for the dimers of each element, and decrease the computational cost of a DFT calculation.

The set of codes are:

  • mol2xyz: converts a .mol file into an ordered .xyz file
  • pastemol: joins two .xyz files
  • movemol: translates and aligns the molecule with some predefined axes
  • stackmol: generates (manually or randomly) different stacking arrangements between two molecules
  • geodiff: compares the internal coordinates of two molecules
  • xyz2zmt_s: converts the cartesian coordinates contained in a .xyz file into Z-matrix (2 possible formats)
  • zmt2xyz_s: converts a Z-matrix (from 2 possible formats) into cartesian coordinates
  • ucubcellgen: calculates the vectors of a unit cell given some atomic coordinates.

Modules source codes can be found here.  For a detailed explanation of the main programs, please have a look to this file. A complete tutorial on how to use the different codes from the package Geomoltools in order to manipulate (rotate, translate, join, pack, convert, etc.) molecular geometries, can be found at this address.

Motivation and exploitation

These modules have been used to study the stacking arrangements of acceptor:donor molecules for organic photovolatics polymers by high-throughput computation with the SIESTA code. This set of codes are available under the GNU General Public License (GPL) version 2.

Share

Path density for OpenPathSampling

Module path density implements path density calculations for the OpenPathSampling (OPS) package, including a generic multidimensional sparse histogram, and plotting functions for the two-dimensional case. Path density plots provide a way to visualize kinetic information obtained from path sampling, such as the mechanism of a rare event. In addition, the code in this module can also be used to visualize thermodynamic information such as free energy landscapes.

This module has been incorporated into the core of OPS, an open-source Python package for path sampling that wraps around other classical Molecular Dynamics (MD) codes [1]. An easy-to-read article on the use of path sampling methods to study rare events, and the role of the OPS package to performing these simulations can be found here.

At first glance, a typical path density plot may appear similar to a two-dimensional free energy landscape plot. They are both “heatmap”-type plots, plotting a two-dimensional histogram in some pair of collective variables. However, path density differs from free energy in several important respects:

  • A path density plot is histogrammed according to the number of paths, not the number of configurations. So if a cell is visited more than once during a path, it still only gets counted once.
  • A path density plot may interpolate across cells that the path jumps over. This is because it is assumed that the input must actually be continuous.

These differences can prevent metastable regions from overwhelming the transition regions in the plot. When looking at mechanisms, the path density is a more useful tool than the raw configurational probability.

Module documentation can be found here, including a link to the source code. This and other software modules for studying the thermodynamics and kinetics of rare events where recently documented in deliverable D1.2.: Classical MD E-CAM modules I, available here.

Motivation and exploitation

The path density is one of the most important tools for visualizing mechanisms, and is often one of the first things to analyze in order to draw scientific conclusions about the mechanism from transition path sampling simulations. This module was used to illustrate the differences between dynamics of the wild-type and oncogenic mutant forms of KRas, as part of one student’s master’s thesis and another student’s bachelor’s thesis at the University of Amsterdam. Results from those projects are currently in preparation for publication [2].

 

[1] Jan-Hendrik Prinz, David W.H. Swenson, Peter G. Bolhuis, and John D. Chodera. OpenPathSampling: A Python framework for path sampling simulations. I. Introduction and usage. In prep.
[2] Sander Roet, Ferry Hooft, Peter G. Bolhuis, David W.H. Swenson, and Jocelyne Vreede. Simulating the dynamics of oncogenic and wild-type KRas. In prep.

Share

Second-Order Differencing Scheme

This module, SodLib, provides exact wavefunction propagation using the second-order differencing (SOD) integrator scheme to solve the time-dependent Schrödinger equation as described by Leforestier et al, J. Comp Phys, 94, 59-80, 1991. Within this scheme the time interval is determined through dividing hbar by the eigenvalue of the Hamiltonian operator with the largest absolute value. This routine has been implemented and tested as an added functionality within the Quantics software package available through CCPForge.

Quantics is a package to study chemical reactions of molecules whose main developer (G. Worth, University College London) is a member of E-CAM’s WP3 – Quantum Dynamics. It incorporates a variety of quantum dynamical methods joined by the fact that the state system is usually described via wavefunctions (containing the quantum analogue of the information given by positions and velocities for classical atoms). It is increasingly used by the computational chemistry community for scientific applications. Work is on-going in E-CAM to improve its scalability (see E-CAM deliverable D7.2 ) and add new functionalities in view of applications to study materials and light harvesting complexes.

Module documentation can be found here, including a link to the source code.

Practical application and exploitation of the code

The module is currently being used in a Phd thesis and the results of this application will provide benchmarks for a model describing proton-transfer in a condensed phase system.

 

Share

First GPU version of the DL_MESO_DPD code

DL_MESO_DPD, is the Dissipative Particle Dynamics (DPD) code from the mesoscopic simulation package DL_MESO [1], developed by Dr. Michael Seaton at Daresbury Laboratory (UK). This open source code is available from Science and Technology Facilities Council (STFC) under both academic (free) and commercial (paid) licenses. E-CAM’s Work-package 4 (WP4), Meso and Multi-scale Modelling, makes use of the DL_MESO_DPD code. See this article on our news feed, for more information on how it is used within E-CAM.

In order to accelerate the DL_MESO_DPD code on the latest and future exascale hardware, a first version for NVidia GPUs has been developed. This is only a starting point, it does not yet cover all the possible cases and it does not yet support multiple GPUs. However, it represents an HPC milestone for the application, complementing the already present parallel versions developed for shared and distributed memory (MPI/OpenMP).

Module documentation including purpose, testing and background information, can be found here. The GPU-version to CPU-version performance analysis can be found in the module documentation and in deliverable D7.2.: E-CAM software porting and benchmarking data I, recently submitted to the EU.

[1] Michael A. Seaton, Richard L. Anderson, SebastianMetz, andWilliamSmith. DL_meso: highly scalable mesoscale simulations. Molecular Simulation, 39(10):796–821, September 2013.

Share

LibOMM : Orbital Minimization Method Library

Purpose

The library LibOMM solves the Kohn-Sham equation as a generalized eigenvalue problem for a fixed Hamiltonian. It implements the orbital minimization method (OMM), which works within a density matrix formalism. The basic strategy of the OMM is to find the set of Wannier functions (WFs) describing the occupied subspace by direct unconstrained minimization of an appropriately-constructed functional. The density matrix can then be calculated from the WFs. The solver is usually employed within an outer self-consistency (SCF) cycle. Therefore, the WFs resulting from one SCF iteration can be saved and then re-used as the initial guess for the next iteration.

More information on the module’s documentation can be found here, and the source code is available from the E-CAM Gitlab here. The algorithms and implementation of the library are described in https://arxiv.org/abs/1312.1549v1.

This module is an effort from the Electronic Structure Library Project (ESL), and it was initiated during an E-CAM Extended Software Development Workshop in Zaragoza in June 2016. This and other codes revolved around the broad theme of solvers, were recently reported in Deliverable D2.1.: Electronic structure E-CAM modules I, available for download and consultation here.

Practical application and exploitation of the module

libOMM is one of the libraries supported and enhanced by the Electronic Structure Infrastructure ELSI [1], which in turn is interfaced with the DGDFT, FHI-aims, NWChem, and SIESTA codes.

[1] The electronic structure infrastructure ELSI  provides and enhances scalable, open-source software library solutions for electronic structure calculations in materials science, condensed matter physics, chemistry, molecular biochemistry, and many other fields [https://arxiv.org/abs/1705.11191v1].

Share

Analysis of charge dipole moments in DL_MESO_DPD

The present module, gen_dipole.f90, is a generalization of the dipole.f90 post-processing utility of DL_MESO_DPD, the Dissipative Particle Dynamics (DPD) code from the DL_MESO package. It processes the trajectory (HISTORY) files to obtain the charge dipole moments of all the (neutral) molecules in the system. It produces files dipole_* containing the time evolution of relevant quantities (see module documentation for more information). In the case of a single molecular species, it also prints to the standard output the Kirkwood number g_k and the relative electric permittivity \epsilon_r for this species, together with an estimate for their errors (standard error).

The module can be applied to systems including molecules with a generic charge structure, as long as each molecule is neutral (otherwise the charge dipole moment would be frame-dependent).

gen_dipole.f9 is available under BSD license, and is a post-processing utilities to be used with DL_MESO in its last released version, version 2.6 (dating November 2015). They have been developed in the context of the pilot project 1 of WP 4, which concerns the derivation of a realistic polarizable model of water to be used in DPD simulations. This project involves a collaboration between computational scientists (STFC Daresbury), academia (University of Manchester), and industry (Unilever). This and other modules based on DL_MESO_DPD have recently been reported in deliverable D4.2: Meso- and multi-scale modelling E-CAM modules I, available for consultation here.

Share

Solvers for quantum atomic radial equations

SQARE (solvers for quantum atomic radial equations) is a library of utilities intended for dealing with functions discretized on radial meshes, wave-equations with spherical symmetry and their corresponding quantum states. The utilities are segregated into three levels: radial grids and functions, ODE solvers, and states.

For more information see modules SQARE radial grids and functions, SQARE ODE and SQARE states documentations.

Share

ClassMC

Module ClassMC samples the system phase space using the classical Boltzmann distribution function and calculates the time correlation functions from the sampled initial conditions. For more information check the module documentation here.

Share

Direct MD (on-the-fly) flux/rate in OpenPathSampling

This module, based on OpenPathSampling, calculates the flux out of a state and through an interface, or the rate of the transition between two states, while running a trajectory. For more information check the module documentation here.

 

Share