Integration of ESL modules into electronic-structure codes

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. Electronic structure software complexity is consequently also increasing, requiring a larger effort on code maintenance. Developers of large electronic structure codes are trying to relieve some complexity by transitioning standardized algorithms into separate libraries [BigDFT-PSolver, ELPA, ELSI, LibXC, LibGridXC, etc.]. This paradigm shift requires library developers to have a hybrid developer profile where the scientific and computational skill set becomes equally important. These topics have been extensively and publicly discussed between developers of various projects including ABINIT, ASE, ATK, BigDFT, CASTEP, FHI-aims, GPAW, Octopus, Quantum Espresso, SIESTA, and SPR-KKR.

High-quality standardized libraries are not only a highly challenging effort lying at the hands of the library developers, they also open possibilities for codes to take advantage of a standard way to access commonly used algorithms. Integration of these libraries, however, requires a significant initial effort that is often sacrificed for new developments that often not even reach the mainstream branch of the code. Additionally, there are multiple challenges in adopting new libraries which have their roots in a variety of issues: installation, data structures, physical units and parallelism – all of which are code-dependent. On the other hand, adoption of common libraries ensures the immediate propagation of improvements within the respective library’s field of research and ensures codes are up-to-date with much less effort [LibXC]. Indeed, well-established libraries can have a huge impact on multiple scientific communities at once [PETSc].

In the Electronic Structure community, two issues are emerging. Libraries are being developed [esl, esl-gitlab] but require an ongoing commitment from the community with respect to sharing the maintenance and development effort. Secondly, existing codes will benefit from libraries by adopting their use. Both issues are mainly governed by the exposure of the libraries and the availability of library core developers, which are typically researchers pressured by publication deliverables and fund-raising burdens. They are thus not able to commit a large fraction of their time to software development.

An effort to allow code developers to make use of, and develop, shared components is needed. This requires an efficient coordination between various elements:

– A common and consistent code development infrastructure/education in terms of compilation, installation, testing and documentation.
– How to use and integrate already published libraries into existing projects.
– Creating long-lasting synergies between developers to reach a “critical mass” of component contributors.
– Relevant quality metrics (“TRLs” and “SRLs”), to provide businesses with useful information .

This is what the Electronic Structure Library (ESL)[esl, esl-gitlab] has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organizing workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative [exdci].

 

References

[BigDFT-PSolver] http://bigdft.org/Wiki/index.php?title=The_Solver_Package
[ELPA] https://gitlab.mpcdf.mgp.de/elpa/elpa
[ELSI] http://elsi-interchange.org
[LibXC] http://www.tddft.org/programs/libxc/
[LibGridXC] https://launchpad.net/libgridxc
[PETSc] https://www.mcs.anl.gov/petsc/
[esl] http://esl.cecam.org/
[esl-gitlab] http://gitlab.e-cam2020.eu/esl
[exdci] https://exdci.eu/newsroom/press-releases/exdci-towards-common-hpc-strategy-europe

Share

Extended Software Development Workshop: Scaling Electronic Structure Applications

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. The large feature-full codes that were once developed within one field are now undergoing a heavy restructuring to reach much broader communities, including companies and non-scientific users[1]. More and more use cases and workflows are performed by highly-automated frameworks instead of humans: high-throughput calculations and computational materials design[2], large data repositories[3], and multiscale/multi-paradigm modeling[4], for instance. At the same time, High-Performance Computing Centers are paving the way to exascale, with a cascade of effects on how to operate, from computer architectures[5] to application design[6]. The disruptive paradigm of quantum computing is also putting a big question mark on the relevance of all the ongoing efforts[7].

All these trends are highly challenging for the electronic structure community. Computer architectures have become rapidly moving targets, forcing a global paradigm shift[8]. As a result, long-ignored and well-established software good practices that were summarised in the Agile Manifesto[9] nearly 20 years ago are now adopted at an accelerating pace by more and more software projects[10]. With time, this kind of migration is becoming a question of survival, the key for a successful transformation being to allow and preserve an enhanced collaboration between the increasing number of disciplines involved. Significant efforts of integration from code developers are also necessary, since both hardware and software paradigms have to change at once[11].

Two major issues are also coming from the community itself. Hybrid developer profiles, with people fluent both in computational and scientific matters, are still difficult to find and retain. On the long run, the numerous ongoing training initiatives will gradually improve the situation, while on the short run, the issue is becoming more salient and painful, because the context evolves faster than ever. Good practices have usually been the first element sacrificed in the “publish or perish” race. New features have usually been bound to the duration of a post-doc contract and been left undocumented and poorly tested, favoring the unsustainable “reinventing the wheel” syndrome.

Addressing these issues requires coordinated efforts at multiple levels:
– from a methodological perspective, mainly through the creation of open standards and the use of co-design, both for programming and for data[12];
– regarding documentation, with a significant leap in content policies, helped by tools like Doxygen and Sphinx, as well as publication platforms like ReadTheDocs[13];
– for testing, by introducing test-driven development concepts and systematically publishing test suites together with software[14];
– considering deployment, by creating synergies with popular software distribution systems[15];
– socially, by disseminating the relevant knowledge and training the community, through the release of demonstrators and giving all stakeholders the opportunity to meet regularly[16].

This is what the Electronic Structure Library (ESL)[17] has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organising workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative (EXDCI)[18].

Since 2014, the Electronic Structure Library has been steadily growing and developing to cover most fundamental tasks required by electronic structure codes. In February 2018 an extended software development workshop will be held at CECAM-HQ with the purpose of building demonstrator codes providing powerful, non-trivial examples of how the ESL libraries can be used. These demonstrators will also provide a platform to test the performance and usability of the libraries in an environment as close as possible to real-life situations. This marks a milestone and enables the next step in the ESL development: going from a collection of libraries with a clear set of features and stable interfaces to a bundle of highly efficient, scalable and integrated implementations of those libraries.

Many libraries developed within the ESL perform low-level tasks or very specific steps of more complex algorithms and are not capable, by themselves, to reach exascale performances. Nevertheless, if they are to be used as efficient components of exascale codes, they must provide some level of parallelism and be as efficient as possible in a wide variety of architectures. During this workshop, we propose to perform advanced performance and scalability profiling of the ESL libraries. With that knowledge in hand it will be possible to select and implement the best strategies for parallelizing and optimizing the libraries. Assistance from HPC experts will be essential and is an unique opportunity to foster collaborations with other Centres of Excellence, like PoP (https://pop-coe.eu/) and MaX (http://www.max-centre.eu/).

Based on the successful experience of the previous ESL workshops, we propose to divide the workshop in two parts. The first two days will be dedicated to initial discussions between the participants and other invited stakeholders, and to presentations on state-of-the art methodological and software developments, performance analysis and scalability of applications. The remainder of the workshop will consist in a 12 days coding effort by a smaller team of experienced developers. Both the discussion and software development will take advantage of the ESL infrastructure (wiki, gitlab, etc) that was set up during the previous ESL workshops.

[1] See http://www.nanogune.eu/es/projects/spanish-initiative-electronic-simulations-thousands-atoms-codigo-abierto-con-garantia-y and
[2] See http://pymatgen.org/ and http://www.aiida.net/ for example.
[3] http://nomad-repository.eu/
[4] https://abidev2017.abinit.org/images/talks/abidev2017_Ghosez.pdf
[5] http://www.deep-project.eu/
[6] https://code.grnet.gr/projects/prace-npt/wiki/StarSs
[7] https://www.newscientist.com/article/2138373-google-on-track-for-quantum-computer-breakthrough-by-end-of-2017/
[8] https://arxiv.org/pdf/1405.4464.pdf (sustainable software engineering)
[9] http://agilemanifesto.org/
[10] Several long-running projects routinely use modern bug trackers and continuous integration, e.g.: http://gitlab.abinit.org/, https://gitlab.com/octopus-code/octopus, http://qe-forge.org/, https://launchpad.net/siesta
[11] Transition of HPC Towards Exascale Computing, Volume 24 of Advances in Parallel Computing, E.H. D’Hollander, IOS Press, 2013, ISBN: 9781614993247
[12] See https://en.wikipedia.org/wiki/Open_standard and https://en.wikipedia.org/wiki/Participatory_design
[13] See http://www.doxygen.org/, http://www.sphinx-doc.org/, and http://readthedocs.org/
[14] See https://en.wikipedia.org/wiki/Test-driven_development and http://agiledata.org/essays/tdd.html
[15] See e.g. http://www.etp4hpc.eu/en/esds.html
[16] See e.g. https://easybuilders.github.io/easybuild/, https://github.com/LLNL/spack, https://github.com/snapcore/snapcraft, and https://www.macports.org/ports.php?by=category&substr=science
[17] http://esl.cecam.org/
[18] https://exdci.eu/newsroom/press-releases/exdci-towards-common-hpc-strategy-europe

Share

SCDM_WFs

 
Module SCDM_WFs implements the selected columns of the density matrix (SCDM) method [1] for building localized Wannier Functions (WFs). Wannier90 [2] is a post-processing tool for the computation of the Maximally Localised Wannier Functions (MLWFs) [3,4,5], which have been increasingly adopted by the electronic structure community for different purposes. The reasons are manifold: MLWFs provide an insightful chemical analysis of the nature of bonding, and its evolution during, say, a chemical reaction. They play for solids a role similar to localized orbitals in molecular systems. In the condensed matter community, they are used in the construction of model Hamiltonians for, e.g., correlated-electron and magnetic systems. Also, they are pivotal in first-principles tight-binding Hamiltonians, where chemically-accurate Hamiltonians are constructed directly on the Wannier basis, rather than fitted or inferred from macroscopic considerations, and many other applications, e.g. dielectric response and polarization in materials, ballistic transport, analysis of phonons, photonic crystals, cold atom lattices, and the local dielectric responses of insulators, for reference see [3]. This module is a first step towards the automation of MLWFs. In the original Wannier90 framework, automation of MLWFs is hindered by the difficult step of choosing a set of initial localized functions with the correct symmetries and centers to use as an initial guess for the optimization. As a result, high throughput calculations (HTC) and big data analysis with MLWFs have proved to be problematic to implement.

This module is part of the newly developed Wannier90 utilities within the pilot project on Electronic Structure Functionalities for Multi-Thread Workflows. The module is part of the pw2wannier interface between the popular QUANTUM ESPRESSO code link and Wannier90. It will be part of the next version of QUANTUM ESPRESSO v.6.3 and Wannier90. Moreover, it has been successfully added in a developer branch of the AiiDA workflow [6] to perform HTC on large material datasets.

Practical application and exploitation of the code

The SCDM-k method [1] removes the need for an initial guess altogether by using information contained in the single-particle density matrix. In fact, the columns of the density matrix are localized in real space and can be used as a vocabulary to build the localized WFs. The SCDM-k method can be used in isolation to generate well localized WFs. More interestingly is the possibility of coupling the SCDM-k method to Wannier90. The core idea is to use WFs generated by the SCDM-k method as an initial guess in the optimization procedure within Wannier90. This module is a big step towards the automation of WFs and simplification of the use of the Wannier90 program. The module is therefore intended for all the scientists that benefit from the use of WFs in their research. Furthermore, by making the code more accessible and easier to use, this module will certainly increase the popularity of the Wannier90 code.

 
[1] A. Damle, L. Lin, L. Ying SCDM-k: Localized orbitals for solids via selected columns of the density matrix J.Comp.Phys. 334 (2017) 1
[2] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, N. Marzari wannier90: A tool for obtaining maximally-localised Wannier functions Com. Phys. Comm. 178 (2008) 685
[3] N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, D. Vanderbilt Maximally localized Wannier functions: Theory and applications Rev. Mod. Phys. 84 (2012) 1419
[4] N. Marzari, D. Vanderbilt Maximally localized generalized Wannier functions for composite energy bands Phys. Rev. B 56 (1997) 12847
[5] I. Souza, N. Marzari, D. Vanderbilt Maximally localized Wannier functions for entangled energy bands Phys. Rev. B 65 (2001) 035109
[6] G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, B. Kozinsky AiiDA: automated interactive infrastructure and database for computational science Comp. Mat. Sci. 111 (2016) 218

Share

State-of-the-Art Workshop: Improving the accuracy of ab-initio predictions for materials

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Ab-initio simulation methods are the major tool to perform research in condensed matter physics, materials science, quantum and molecular chemistry. They can be classified in terms of their accuracy and efficiency, but typically more accurate means less efficient and vice-versa. The accuracy depends mainly on how accurate one can solve the electronic problem. The most accurate algorithms are the wave-function based methods, such as Full CI, Coupled Cluster (CC), and Quantum Monte Carlo (QMC) followed by the Density Functional Theory-(DFT)-based methods and finally more approximate methods such as Tight-Binding. Another impor- tant consideration is how the accuracy of a given method scales with the size of the system under consideration. Among the wave-function based methods, the accuracy of traditional quantum chemistry methods can be sys- tematically improved but their scaling with system size limits their applicability to small molecules. On the other hand, QMC methods have a much more tractable scaling and have, in spite of the “fermion sign problem” and the commonly used fixed-node approximation, because the energies are variational upper bounds, a way of systematically improving the accuracy. Recently there has been much progress in the use of pseudopotentials and the systematic improvement of nodal surfaces using backflow, and multiple determinants. [1, 2, 3] Conversely DFT based methods are based on a plethora of different self-consistent mean field approxima- tions, each one tuned to best represent a class of systems but with limited transferability. Despite progress in developing more general functionals [4, 5, 6], DFT is missing an “internal” accuracy scale; its accuracy is gen- erally established against more fundamental theories (like CC or QMC) or against experiments. DFT methods are very popular because their favorable scaling with system size, the same as for QMC, but with a smaller prefactor.
In a number of recent applications [7, 8] it was found that inclusion of nuclear quantum effects (NQE) worsen considerably the agreement between DFT predictions and experiments. This is ascribed to the inac- curacies of DFT. This illustrates the importance of not using experimental data alone to improve the DFT functional but instead calculations using more fundamental methods. There has been a recent effort to establish the accuracy of DFT approximations by benchmarking with QMC calculations not only for equilibrium geome- tries but also for thermal configurations. This benchmarking can be customized for the individual molecules at a given temperature and pressure and geometry [9, 10, 11, 12].
Another important aspect concerns finite size effects in modelling extended systems. Although corrections can be developed for homogenous systems, for more complex situations with several characteristic length scales one needs to consider systems sizes that cannot be tackled by ab-initio methods. In these applications one needs to use an effective interaction energy. A recent development is the use of Machine Learning (ML) techniques to obtain energy functions with ab-initio accuracy [13, 14, 15]. Their transferability and accuracy assessment is still unsolved to some extent but progress is rapid. A related development is to use ML methods to by-passing the Kohn-Sham paradigm of DFT and directly address potential-density map [16, 17, 18]

The following is a list of topics that will be discussed during the meeting:
• Benchmarking existing DFT functionals with QMC. DFT has the potential to be accurate, but the main problem with its predictive power is that its accuracy can be system dependent. QMC was instrumental in developing the first exchange-correlation approximations (e. g. LDA), and we envisage that it can play a substantial role to help the discovery and tuning of new functionals. In particular, the tuning of dispersion interactions appears to be a crucial elements still not fully controlled in modern DFT approximations while it plays a crucial role in many systems like hydrogen and hydrogen based materials such as water.
• ML approaches with QMC accuracy. Machine Learning (ML) has attracted significant interest recently, mainly because of its potential to study real life systems, and also to explore the phase space at a scale that is not available to ab-initio methods. However, crucial for the ML method is the quality of the training set. It is often possible to train a ML potential on small systems, where accurate energies and forces can be obtained by quantum chemistry methods. However, training sets including larger systems are needed. QMC has the potential to provide them especially going forward with exascale computing.
• opportunity for new exascale applications of QMC to impact simulation for larger systems and longer time scale. QMC is capable of exploiting parallelism very efficiently, and is probably one of the few methods already capable of running at the exascale level. ML methods on large data set are also inherently parallel and directly usable on exascale machines.
• We will address the problem of using and testing the force field derived for a small systems to those of a much larger size.
• We will discuss the use of ML methods to derive new classes of wave functions for QMC calculations of complex systems.

[1] J. Kolorenc and L. Mitas, Rep. Prog. Phys. 74, 1 (2010).
[2] L. K. Wagner and D. M. Ceperley, Rep. Prog. Phys. 79, 094501 (2016).
[3] M. Taddei, M. Ruggeri, S. Moroni, and M. Holzmann, Phys. Rev. B 91, 115106 (2015).
[4] J. Heyd, G. Scuseria, and M. Ernzerhof, The Journal of Chemical Physics 118, 8207 (2003).
[5] K. Lee, É. Murray, L. Kong, B. Lundqvist, and D. Langreth, Physical Review B 82, 81101 (2010).
[6] K. Berland et al., Reports on Progress in Physics 78, 66501 (2015).
[7] M. A. Morales, J. McMahon, C. Pierleoni, and D. M. Ceperley, Physical Review Letters 110, 65702 (2013).
[8] M. Rossi, G. P, and M. Ceriotti, Physical Review Letters 117, 115702 (2016).
[9] R. C. Clay et al., Physical Review B 89, 184106 (2014).
[10] M. A. Morales et al., Journal of Chemical Theory and Computation 10, 2355 (2014).
[11] R. C. Clay, M. Holzmann, D. M. Ceperley, and M. A. Morales, Physical Review B 93, 035121 (2016).
[12] M. J. Gillan, F. Manby, M. Towler, and D. Alfè, The Journal of Chemical Physics 136, 244105 (2012).
[13] K. V. J. Jose, N. Artrith, and J. Behler, Journal of Chemical Physics 136, 194111 (2012).
[14] J. Behler, The Journal of Chemical Physics 145, 170901 (2016).
[15] V. Botu, R. Batra, J. Chapman, and R. Ramprasad, The Journal of Physical Chemistry C 121, 511 (2016).
[16] J. C. Snyder, M. Rupp, K. Hansen, K.-R. Mu ̈ller, and K. Burke, Physical Review Letters 108, 253002 (2012).
[17] L. Li, T. E. Baker, S. R. White, and K. Burke, Phys. Rev. B 94, 245129 (2016).
[18] F. Brockherde et al., arXiv:1609.02815v3 (2017).

Share

Symmetry Adapted Wannier Functions – a Component of the Wannier90

 

Symmetry Adapted Wannier Functions is a module within Wannier90 which is devoted to the construction of Wannier function (WF) with a given symmetry. The procedure implemented in this module enables one to control the symmetry and center of the WFs and also simplifies the minimisation of the spread functional under these symmetry constraints.

This module is part of the nine modules reported in Deliverable D2.3 which together deal with the implementation of symmetry adapted WFs, to improve the symmetery of the WFs and related electronic-structure quantities, such as band structure and density of states; improvements in the interpolation of band structures, developments in the selection of the k-point mesh to increase accuracy, ability of performing non-collinear spin calculations as well as interface layer modules to tight-binding codes.

Starting from an E-CAM ESDW3 in San Sebastian organised by the Wannier90 developers, a set of nine modules were produced to meet the desire of the electronic-structure community to extend the use of WFs, and in particular of Maximally Localised Wannier Functions (MLWFs), to a broader class of physical and chemical problems by adding new functionality to the Wannier90 code.

All modules are accessible through the Wannier90 code, which in turn is interfaced with the all the most popular DFT codes. Wannier90 is used as a postprocessing tool. Therefore, the end users of electronic-structure codes, such as DFT, Tight Binding and Quantum Monte Carlo codes, that are interfaced with these modules via Wannier90, will benefit from the functionalities they provide, e.g. WFs with improved symmetry, spin-orbit calculations etc., and they can focus on developing new ideas, and new science without needing to rewrite functionalities that are already established.

Practical application and exploitation of the code

Wannier functions are an important class of functions which enable one to obtain a real-space picture of the electronic structure of a system. They provide an insightful chemical analysis of the nature of bonding, and chemical reaction in condensed-matter physics, similar to the role played by localised molecular orbitals in chemistry. They are also a powerful tool in the study of dielectric properties via the modern theory of polarisation. In the condensed-matter community WFs are employed in the construction of model Hamiltonians for, e.g., correlated-electron and magnetic systems (to study new quantum phases of matter) and are used as building blocks in first-principles Tight Binding Hamiltonians, where chemically accurate Hamiltonians are constructed directly on the Wannier basis, rather than fitted or inferred from macroscopic considerations. [1]

Wannier90 [2] is a program that, for a given system, generates the Wannier functions with minimum spatial spreads, known as MLWFs, among the class of all possible WFs. The locality of MLWFs can be exploited to compute, among other things, band-structure, density of states and Fermi surfaces at modest computational cost.

The developed modules have been used to study the properties of strongly correlated materials and to assess the quality of high-level quantum methods. [3]

 

[1] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, N. Marzari wannier90: A tool for obtaining maximally-localised wannier functions Comput. Phys. Commun 178 (2008) 685

[2] N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, D. Vanderbilt Maximally localized wannier functions: Theory and applications Rev. Mod. Phys. 84 (2012) 1419

[3] L. Boehnke, F. Nilsson, F. Aryasetiawan, P. Werner When strong correlations become weak: Consistent merging of GW and DMFT Phys. Rev. B 94 (2016) 201106

Share

New report published: Identification / Selection of E-CAM Electronic Structure Codes for Development

 

Read our latest report on the state of the art codes and methods in Quantum Monte Carlo, Density Functional Theory (DFT) and beyond DFT methods. This report contains a review of the software available in these areas and on the basic features that the majority of these codes have in common with a view to modularisation. Based on that, a list of software development projects to be developed by E-CAM is discussed.

Full report available here.

Share

Scientific reports from the 2017 E-CAM workshops, are now available on our website

 

The scientific reports* from the following workshops conducted in year 2 of the project E-CAM (2017):

  1. E-CAM Scoping Workshop: “From the Atom to the Material” , 18- 20 September 2017, University of Cambridge, UK,
  2. E-CAM State-of-the-Art Workshop WP4: Meso and Multiscale Modelling, 29 May – 1 June 2017, University College Dublin, Ireland,

are now available for download on our website at this location. Furthermore, they will also integrate the CECAM Report of Activities 2017, published every year on the website www.cecam.org.

Each report includes:

  • an overview of the remit of the workshop,
  • the workshop program,
  • the list of attendees,
  • the major outcomes,
  • how these outcomes relate to community needs,
  • how the recommendation could be funded,
  • and how they relate to society and industry,
  • emphasis and impact on software development.

 

*© CECAM 2017, all rights reserved.

Please address any comments or questions to info@e-cam2020.eu.

Share

Metal-ion force field developed by E-CAM using novel Machine Learning procedure is now available for download

 

The database of the force fields developed by the SNS SMART group (SNS, Pisa, Italy), including the metal-ions force fields optimized within E-CAM using novel Machine Learning procedure (reported in a recent publication[1] and in a case study reported by E-CAM here), are now available for download at http://smart.sns.it/vmd_molecules/.

[1] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273 DOI: 10.1021/acs.jctc.7b00779

 

Share

The simulation of metal ions in protein-water systems using machine learning: An E-CAM case study and conversation

 

With Dr. Francesco Fracchia, Scuola Normale Superiore di Pisa

Interviewer: Dr. Donal Mackernan, University College Dublin

 

Abstract

One quarter to one third of all proteins require metals to function but the description of metal ions in standard force fields is still quite primitive. In this case study and interview an E-CAM project to develop a suitable parameterisation using machine learning is described.  The training scheme combines classical simulation with electronic structure calculations to produce a force field comprising standard classical force fields with additional terms for the metal ion-water and metal ion-protein interactions. The approach allows simulations to run as fast as standard molecular dynamics codes, and is suitable for efficient massive parallelism scale-up.

Continue reading…

Share

GRASP Sampling – a module to build a representative data set for a fitting procedure

GRASP_sampling performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size (N) of the training set required, the module executes the combinatorial optimization that maximizes the following dissimilarity score (DS) among the elements of the training set:

../../../_images/dissimilarity_score.png

In this formula, the j-th configuration in the sum is the j-th nearest one to the l-th configuration and dij is the Euclidean distance between the l-th and j-th configurations. M is the number of the nearest configurations considered in the score. The exponential weight makes the score near independent from the particular value of M, if it is larger than 4-6.

The combinatorial optimization that maximizes the dissimilarity score is performed using the greedy randomized adaptive search procedure[1]  (GRASP) algorithm. A stratified sampling can be performed without a combinatorial optimization using classical statistical techniques (for example Latin hypercube sampling), the GRASP sampling becomes useful when the selection is restricted to a predeterminated set of configurations, generated or sampled with specific internal constrains. This is the case of the molecular configurations generated in a molecular dynamics simulation.

The complete module documentation, including a link to the source code, can be found in our repository here

Motivation and exploitation

The application of the GRASP algorithm to perform a stratified sampling is described in a recent publication [2] by the E-CAM partners at Scuola Normale Superiore (SNS), that we previously reported here.

The motivation behind this software module is the pilot project with industry “Quantum Mechanical Parameterisation of Metal Ions in Proteins” sustained by an E-CAM postdoctoral researcher from SNS.

 

[1] Feo, T. A.; Resende, M. G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109−133

[2] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273

Share