Extended Software Development Workshop: Scaling Electronic Structure Applications

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. The large feature-full codes that were once developed within one field are now undergoing a heavy restructuring to reach much broader communities, including companies and non-scientific users[1]. More and more use cases and workflows are performed by highly-automated frameworks instead of humans: high-throughput calculations and computational materials design[2], large data repositories[3], and multiscale/multi-paradigm modeling[4], for instance. At the same time, High-Performance Computing Centers are paving the way to exascale, with a cascade of effects on how to operate, from computer architectures[5] to application design[6]. The disruptive paradigm of quantum computing is also putting a big question mark on the relevance of all the ongoing efforts[7].

All these trends are highly challenging for the electronic structure community. Computer architectures have become rapidly moving targets, forcing a global paradigm shift[8]. As a result, long-ignored and well-established software good practices that were summarised in the Agile Manifesto[9] nearly 20 years ago are now adopted at an accelerating pace by more and more software projects[10]. With time, this kind of migration is becoming a question of survival, the key for a successful transformation being to allow and preserve an enhanced collaboration between the increasing number of disciplines involved. Significant efforts of integration from code developers are also necessary, since both hardware and software paradigms have to change at once[11].

Two major issues are also coming from the community itself. Hybrid developer profiles, with people fluent both in computational and scientific matters, are still difficult to find and retain. On the long run, the numerous ongoing training initiatives will gradually improve the situation, while on the short run, the issue is becoming more salient and painful, because the context evolves faster than ever. Good practices have usually been the first element sacrificed in the “publish or perish” race. New features have usually been bound to the duration of a post-doc contract and been left undocumented and poorly tested, favoring the unsustainable “reinventing the wheel” syndrome.

Addressing these issues requires coordinated efforts at multiple levels:
– from a methodological perspective, mainly through the creation of open standards and the use of co-design, both for programming and for data[12];
– regarding documentation, with a significant leap in content policies, helped by tools like Doxygen and Sphinx, as well as publication platforms like ReadTheDocs[13];
– for testing, by introducing test-driven development concepts and systematically publishing test suites together with software[14];
– considering deployment, by creating synergies with popular software distribution systems[15];
– socially, by disseminating the relevant knowledge and training the community, through the release of demonstrators and giving all stakeholders the opportunity to meet regularly[16].

This is what the Electronic Structure Library (ESL)[17] has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organising workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative (EXDCI)[18].

Since 2014, the Electronic Structure Library has been steadily growing and developing to cover most fundamental tasks required by electronic structure codes. In February 2018 an extended software development workshop will be held at CECAM-HQ with the purpose of building demonstrator codes providing powerful, non-trivial examples of how the ESL libraries can be used. These demonstrators will also provide a platform to test the performance and usability of the libraries in an environment as close as possible to real-life situations. This marks a milestone and enables the next step in the ESL development: going from a collection of libraries with a clear set of features and stable interfaces to a bundle of highly efficient, scalable and integrated implementations of those libraries.

Many libraries developed within the ESL perform low-level tasks or very specific steps of more complex algorithms and are not capable, by themselves, to reach exascale performances. Nevertheless, if they are to be used as efficient components of exascale codes, they must provide some level of parallelism and be as efficient as possible in a wide variety of architectures. During this workshop, we propose to perform advanced performance and scalability profiling of the ESL libraries. With that knowledge in hand it will be possible to select and implement the best strategies for parallelizing and optimizing the libraries. Assistance from HPC experts will be essential and is an unique opportunity to foster collaborations with other Centres of Excellence, like PoP (https://pop-coe.eu/) and MaX (http://www.max-centre.eu/).

Based on the successful experience of the previous ESL workshops, we propose to divide the workshop in two parts. The first two days will be dedicated to initial discussions between the participants and other invited stakeholders, and to presentations on state-of-the art methodological and software developments, performance analysis and scalability of applications. The remainder of the workshop will consist in a 12 days coding effort by a smaller team of experienced developers. Both the discussion and software development will take advantage of the ESL infrastructure (wiki, gitlab, etc) that was set up during the previous ESL workshops.

[1] See http://www.nanogune.eu/es/projects/spanish-initiative-electronic-simulations-thousands-atoms-codigo-abierto-con-garantia-y and
[2] See http://pymatgen.org/ and http://www.aiida.net/ for example.
[3] http://nomad-repository.eu/
[4] https://abidev2017.abinit.org/images/talks/abidev2017_Ghosez.pdf
[5] http://www.deep-project.eu/
[6] https://code.grnet.gr/projects/prace-npt/wiki/StarSs
[7] https://www.newscientist.com/article/2138373-google-on-track-for-quantum-computer-breakthrough-by-end-of-2017/
[8] https://arxiv.org/pdf/1405.4464.pdf (sustainable software engineering)
[9] http://agilemanifesto.org/
[10] Several long-running projects routinely use modern bug trackers and continuous integration, e.g.: http://gitlab.abinit.org/, https://gitlab.com/octopus-code/octopus, http://qe-forge.org/, https://launchpad.net/siesta
[11] Transition of HPC Towards Exascale Computing, Volume 24 of Advances in Parallel Computing, E.H. D’Hollander, IOS Press, 2013, ISBN: 9781614993247
[12] See https://en.wikipedia.org/wiki/Open_standard and https://en.wikipedia.org/wiki/Participatory_design
[13] See http://www.doxygen.org/, http://www.sphinx-doc.org/, and http://readthedocs.org/
[14] See https://en.wikipedia.org/wiki/Test-driven_development and http://agiledata.org/essays/tdd.html
[15] See e.g. http://www.etp4hpc.eu/en/esds.html
[16] See e.g. https://easybuilders.github.io/easybuild/, https://github.com/LLNL/spack, https://github.com/snapcore/snapcraft, and https://www.macports.org/ports.php?by=category&substr=science
[17] http://esl.cecam.org/
[18] https://exdci.eu/newsroom/press-releases/exdci-towards-common-hpc-strategy-europe

Share

State-of-the-Art Workshop: Improving the accuracy of ab-initio predictions for materials

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Ab-initio simulation methods are the major tool to perform research in condensed matter physics, materials science, quantum and molecular chemistry. They can be classified in terms of their accuracy and efficiency, but typically more accurate means less efficient and vice-versa. The accuracy depends mainly on how accurate one can solve the electronic problem. The most accurate algorithms are the wave-function based methods, such as Full CI, Coupled Cluster (CC), and Quantum Monte Carlo (QMC) followed by the Density Functional Theory-(DFT)-based methods and finally more approximate methods such as Tight-Binding. Another impor- tant consideration is how the accuracy of a given method scales with the size of the system under consideration. Among the wave-function based methods, the accuracy of traditional quantum chemistry methods can be sys- tematically improved but their scaling with system size limits their applicability to small molecules. On the other hand, QMC methods have a much more tractable scaling and have, in spite of the “fermion sign problem” and the commonly used fixed-node approximation, because the energies are variational upper bounds, a way of systematically improving the accuracy. Recently there has been much progress in the use of pseudopotentials and the systematic improvement of nodal surfaces using backflow, and multiple determinants. [1, 2, 3] Conversely DFT based methods are based on a plethora of different self-consistent mean field approxima- tions, each one tuned to best represent a class of systems but with limited transferability. Despite progress in developing more general functionals [4, 5, 6], DFT is missing an “internal” accuracy scale; its accuracy is gen- erally established against more fundamental theories (like CC or QMC) or against experiments. DFT methods are very popular because their favorable scaling with system size, the same as for QMC, but with a smaller prefactor.
In a number of recent applications [7, 8] it was found that inclusion of nuclear quantum effects (NQE) worsen considerably the agreement between DFT predictions and experiments. This is ascribed to the inac- curacies of DFT. This illustrates the importance of not using experimental data alone to improve the DFT functional but instead calculations using more fundamental methods. There has been a recent effort to establish the accuracy of DFT approximations by benchmarking with QMC calculations not only for equilibrium geome- tries but also for thermal configurations. This benchmarking can be customized for the individual molecules at a given temperature and pressure and geometry [9, 10, 11, 12].
Another important aspect concerns finite size effects in modelling extended systems. Although corrections can be developed for homogenous systems, for more complex situations with several characteristic length scales one needs to consider systems sizes that cannot be tackled by ab-initio methods. In these applications one needs to use an effective interaction energy. A recent development is the use of Machine Learning (ML) techniques to obtain energy functions with ab-initio accuracy [13, 14, 15]. Their transferability and accuracy assessment is still unsolved to some extent but progress is rapid. A related development is to use ML methods to by-passing the Kohn-Sham paradigm of DFT and directly address potential-density map [16, 17, 18]

The following is a list of topics that will be discussed during the meeting:
• Benchmarking existing DFT functionals with QMC. DFT has the potential to be accurate, but the main problem with its predictive power is that its accuracy can be system dependent. QMC was instrumental in developing the first exchange-correlation approximations (e. g. LDA), and we envisage that it can play a substantial role to help the discovery and tuning of new functionals. In particular, the tuning of dispersion interactions appears to be a crucial elements still not fully controlled in modern DFT approximations while it plays a crucial role in many systems like hydrogen and hydrogen based materials such as water.
• ML approaches with QMC accuracy. Machine Learning (ML) has attracted significant interest recently, mainly because of its potential to study real life systems, and also to explore the phase space at a scale that is not available to ab-initio methods. However, crucial for the ML method is the quality of the training set. It is often possible to train a ML potential on small systems, where accurate energies and forces can be obtained by quantum chemistry methods. However, training sets including larger systems are needed. QMC has the potential to provide them especially going forward with exascale computing.
• opportunity for new exascale applications of QMC to impact simulation for larger systems and longer time scale. QMC is capable of exploiting parallelism very efficiently, and is probably one of the few methods already capable of running at the exascale level. ML methods on large data set are also inherently parallel and directly usable on exascale machines.
• We will address the problem of using and testing the force field derived for a small systems to those of a much larger size.
• We will discuss the use of ML methods to derive new classes of wave functions for QMC calculations of complex systems.

[1] J. Kolorenc and L. Mitas, Rep. Prog. Phys. 74, 1 (2010).
[2] L. K. Wagner and D. M. Ceperley, Rep. Prog. Phys. 79, 094501 (2016).
[3] M. Taddei, M. Ruggeri, S. Moroni, and M. Holzmann, Phys. Rev. B 91, 115106 (2015).
[4] J. Heyd, G. Scuseria, and M. Ernzerhof, The Journal of Chemical Physics 118, 8207 (2003).
[5] K. Lee, É. Murray, L. Kong, B. Lundqvist, and D. Langreth, Physical Review B 82, 81101 (2010).
[6] K. Berland et al., Reports on Progress in Physics 78, 66501 (2015).
[7] M. A. Morales, J. McMahon, C. Pierleoni, and D. M. Ceperley, Physical Review Letters 110, 65702 (2013).
[8] M. Rossi, G. P, and M. Ceriotti, Physical Review Letters 117, 115702 (2016).
[9] R. C. Clay et al., Physical Review B 89, 184106 (2014).
[10] M. A. Morales et al., Journal of Chemical Theory and Computation 10, 2355 (2014).
[11] R. C. Clay, M. Holzmann, D. M. Ceperley, and M. A. Morales, Physical Review B 93, 035121 (2016).
[12] M. J. Gillan, F. Manby, M. Towler, and D. Alfè, The Journal of Chemical Physics 136, 244105 (2012).
[13] K. V. J. Jose, N. Artrith, and J. Behler, Journal of Chemical Physics 136, 194111 (2012).
[14] J. Behler, The Journal of Chemical Physics 145, 170901 (2016).
[15] V. Botu, R. Batra, J. Chapman, and R. Ramprasad, The Journal of Physical Chemistry C 121, 511 (2016).
[16] J. C. Snyder, M. Rupp, K. Hansen, K.-R. Mu ̈ller, and K. Burke, Physical Review Letters 108, 253002 (2012).
[17] L. Li, T. E. Baker, S. R. White, and K. Burke, Phys. Rev. B 94, 245129 (2016).
[18] F. Brockherde et al., arXiv:1609.02815v3 (2017).

Share

Symmetry Adapted Wannier Functions – a Component of the Wannier90

 

Symmetry Adapted Wannier Functions is a module within Wannier90 which is devoted to the construction of Wannier function (WF) with a given symmetry. The procedure implemented in this module enables one to control the symmetry and center of the WFs and also simplifies the minimisation of the spread functional under these symmetry constraints.

This module is part of the nine modules reported in Deliverable D2.3 which together deal with the implementation of symmetry adapted WFs, to improve the symmetery of the WFs and related electronic-structure quantities, such as band structure and density of states; improvements in the interpolation of band structures, developments in the selection of the k-point mesh to increase accuracy, ability of performing non-collinear spin calculations as well as interface layer modules to tight-binding codes.

Starting from an E-CAM ESDW3 in San Sebastian organised by the Wannier90 developers, a set of nine modules were produced to meet the desire of the electronic-structure community to extend the use of WFs, and in particular of Maximally Localised Wannier Functions (MLWFs), to a broader class of physical and chemical problems by adding new functionality to the Wannier90 code.

All modules are accessible through the Wannier90 code, which in turn is interfaced with the all the most popular DFT codes. Wannier90 is used as a postprocessing tool. Therefore, the end users of electronic-structure codes, such as DFT, Tight Binding and Quantum Monte Carlo codes, that are interfaced with these modules via Wannier90, will benefit from the functionalities they provide, e.g. WFs with improved symmetry, spin-orbit calculations etc., and they can focus on developing new ideas, and new science without needing to rewrite functionalities that are already established.

Practical application and exploitation of the code

Wannier functions are an important class of functions which enable one to obtain a real-space picture of the electronic structure of a system. They provide an insightful chemical analysis of the nature of bonding, and chemical reaction in condensed-matter physics, similar to the role played by localised molecular orbitals in chemistry. They are also a powerful tool in the study of dielectric properties via the modern theory of polarisation. In the condensed-matter community WFs are employed in the construction of model Hamiltonians for, e.g., correlated-electron and magnetic systems (to study new quantum phases of matter) and are used as building blocks in first-principles Tight Binding Hamiltonians, where chemically accurate Hamiltonians are constructed directly on the Wannier basis, rather than fitted or inferred from macroscopic considerations. [1]

Wannier90 [2] is a program that, for a given system, generates the Wannier functions with minimum spatial spreads, known as MLWFs, among the class of all possible WFs. The locality of MLWFs can be exploited to compute, among other things, band-structure, density of states and Fermi surfaces at modest computational cost.

The developed modules have been used to study the properties of strongly correlated materials and to assess the quality of high-level quantum methods. [3]

 

[1] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, N. Marzari wannier90: A tool for obtaining maximally-localised wannier functions Comput. Phys. Commun 178 (2008) 685

[2] N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, D. Vanderbilt Maximally localized wannier functions: Theory and applications Rev. Mod. Phys. 84 (2012) 1419

[3] L. Boehnke, F. Nilsson, F. Aryasetiawan, P. Werner When strong correlations become weak: Consistent merging of GW and DMFT Phys. Rev. B 94 (2016) 201106

Share

New report published: Identification / Selection of E-CAM Electronic Structure Codes for Development

 

Read our latest report on the state of the art codes and methods in Quantum Monte Carlo, Density Functional Theory (DFT) and beyond DFT methods. This report contains a review of the software available in these areas and on the basic features that the majority of these codes have in common with a view to modularisation. Based on that, a list of software development projects to be developed by E-CAM is discussed.

Full report available here.

Share

Scientific reports from the 2017 E-CAM workshops, are now available on our website

 

The scientific reports* from the following workshops conducted in year 2 of the project E-CAM (2017):

  1. E-CAM Scoping Workshop: “From the Atom to the Material” , 18- 20 September 2017, University of Cambridge, UK,
  2. E-CAM State-of-the-Art Workshop WP4: Meso and Multiscale Modelling, 29 May – 1 June 2017, University College Dublin, Ireland,

are now available for download on our website at this location. Furthermore, they will also integrate the CECAM Report of Activities 2017, published every year on the website www.cecam.org.

Each report includes:

  • an overview of the remit of the workshop,
  • the workshop program,
  • the list of attendees,
  • the major outcomes,
  • how these outcomes relate to community needs,
  • how the recommendation could be funded,
  • and how they relate to society and industry,
  • emphasis and impact on software development.

 

*© CECAM 2017, all rights reserved.

Please address any comments or questions to info@e-cam2020.eu.

Share

Metal-ion force field developed by E-CAM using novel Machine Learning procedure is now available for download

 

The database of the force fields developed by the SNS SMART group (SNS, Pisa, Italy), including the metal-ions force fields optimized within E-CAM using novel Machine Learning procedure (reported in a recent publication[1] and in a case study reported by E-CAM here), are now available for download at http://smart.sns.it/vmd_molecules/.

[1] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273 DOI: 10.1021/acs.jctc.7b00779

 

Share

The simulation of metal ions in protein-water systems using machine learning: An E-CAM case study and conversation

 

With Dr. Francesco Fracchia, Scuola Normale Superiore di Pisa

Interviewer: Dr. Donal Mackernan, University College Dublin

 

Abstract

One quarter to one third of all proteins require metals to function but the description of metal ions in standard force fields is still quite primitive. In this case study and interview an E-CAM project to develop a suitable parameterisation using machine learning is described.  The training scheme combines classical simulation with electronic structure calculations to produce a force field comprising standard classical force fields with additional terms for the metal ion-water and metal ion-protein interactions. The approach allows simulations to run as fast as standard molecular dynamics codes, and is suitable for efficient massive parallelism scale-up.

Continue reading…

Share

GRASP Sampling – a module to build a representative data set for a fitting procedure

GRASP_sampling performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size (N) of the training set required, the module executes the combinatorial optimization that maximizes the following dissimilarity score (DS) among the elements of the training set:

../../../_images/dissimilarity_score.png

In this formula, the j-th configuration in the sum is the j-th nearest one to the l-th configuration and dij is the Euclidean distance between the l-th and j-th configurations. M is the number of the nearest configurations considered in the score. The exponential weight makes the score near independent from the particular value of M, if it is larger than 4-6.

The combinatorial optimization that maximizes the dissimilarity score is performed using the greedy randomized adaptive search procedure[1]  (GRASP) algorithm. A stratified sampling can be performed without a combinatorial optimization using classical statistical techniques (for example Latin hypercube sampling), the GRASP sampling becomes useful when the selection is restricted to a predeterminated set of configurations, generated or sampled with specific internal constrains. This is the case of the molecular configurations generated in a molecular dynamics simulation.

The complete module documentation, including a link to the source code, can be found in our repository here

Motivation and exploitation

The application of the GRASP algorithm to perform a stratified sampling is described in a recent publication [2] by the E-CAM partners at Scuola Normale Superiore (SNS), that we previously reported here.

The motivation behind this software module is the pilot project with industry “Quantum Mechanical Parameterisation of Metal Ions in Proteins” sustained by an E-CAM postdoctoral researcher from SNS.

 

[1] Feo, T. A.; Resende, M. G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109−133

[2] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273

Share

New article is out: “Force Field Parametrization of Metal Ions from Statistical Learning Techniques”

 

This paper from E-CAM partners working in Scuola Normale Superiore (Pisa, Italy) describes a novel statistical procedure, developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The paper is open access and can be downloaded directly from ACS’s page at http://pubs.acs.org/doi/10.1021/acs.jctc.7b00779.

This work was performed in the context of the E-CAM pilot project on Quantum Mechanical Parameterisation of Metal Ions in Proteins, which is a collaboration with BiKi Technologies. The list of software modules associated to the pilot project (and this publication) can be found here.

Article

Title: Force Field Parametrization of Metal Ions from Statistical Learning Techniques

Authors: Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone

Abstract: A novel statistical procedure has been developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The criterion for the optimization is the minimization of the deviations from ab initio forces and energies calculated for model systems. The method exploits the combination of the linear ridge regression and the cross-validation techniques with the di˙erential evolution algorithm. Wide freedom in the choice of the functional form of the force fields is allowed since both linear and non-linear parameters can be optimized. In order to maximize the information content of the data employed in the fitting procedure, the composition of the training set is entrusted to a combinatorial optimization algorithm which maximizes the dissimilarity of the included instances. The methodology has been validated using the force field parametrization of five metal ions (Zn2+, Ni2+, Mg2+, Ca2+, and Na+) in water as test cases.

 

Share

Geomoltools: A set of software modules to easily manipulate molecular geometries

Geomoltools is a set of eight pre- and post-treatment Fortran codes that can be used to easily manipulate molecular geometries, allowing to minimize the average energy obtained for a range of internuclear distances for the dimers of each element, and decrease the computational cost of a DFT calculation.

The set of codes are:

  • mol2xyz: converts a .mol file into an ordered .xyz file
  • pastemol: joins two .xyz files
  • movemol: translates and aligns the molecule with some predefined axes
  • stackmol: generates (manually or randomly) different stacking arrangements between two molecules
  • geodiff: compares the internal coordinates of two molecules
  • xyz2zmt_s: converts the cartesian coordinates contained in a .xyz file into Z-matrix (2 possible formats)
  • zmt2xyz_s: converts a Z-matrix (from 2 possible formats) into cartesian coordinates
  • ucubcellgen: calculates the vectors of a unit cell given some atomic coordinates.

Modules source codes can be found here.  For a detailed explanation of the main programs, please have a look to this file. A complete tutorial on how to use the different codes from the package Geomoltools in order to manipulate (rotate, translate, join, pack, convert, etc.) molecular geometries, can be found at this address.

Motivation and exploitation

These modules have been used to study the stacking arrangements of acceptor:donor molecules for organic photovolatics polymers by high-throughput computation with the SIESTA code. This set of codes are available under the GNU General Public License (GPL) version 2.

Share