Integration of ESL modules into electronic-structure codes

[button url=”” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. Electronic structure software complexity is consequently also increasing, requiring a larger effort on code maintenance. Developers of large electronic structure codes are trying to relieve some complexity by transitioning standardized algorithms into separate libraries [BigDFT-PSolver, ELPA, ELSI, LibXC, LibGridXC, etc.]. This paradigm shift requires library developers to have a hybrid developer profile where the scientific and computational skill set becomes equally important. These topics have been extensively and publicly discussed between developers of various projects including ABINIT, ASE, ATK, BigDFT, CASTEP, FHI-aims, GPAW, Octopus, Quantum Espresso, SIESTA, and SPR-KKR.

High-quality standardized libraries are not only a highly challenging effort lying at the hands of the library developers, they also open possibilities for codes to take advantage of a standard way to access commonly used algorithms. Integration of these libraries, however, requires a significant initial effort that is often sacrificed for new developments that often not even reach the mainstream branch of the code. Additionally, there are multiple challenges in adopting new libraries which have their roots in a variety of issues: installation, data structures, physical units and parallelism – all of which are code-dependent. On the other hand, adoption of common libraries ensures the immediate propagation of improvements within the respective library’s field of research and ensures codes are up-to-date with much less effort [LibXC]. Indeed, well-established libraries can have a huge impact on multiple scientific communities at once [PETSc].

In the Electronic Structure community, two issues are emerging. Libraries are being developed [esl, esl-gitlab] but require an ongoing commitment from the community with respect to sharing the maintenance and development effort. Secondly, existing codes will benefit from libraries by adopting their use. Both issues are mainly governed by the exposure of the libraries and the availability of library core developers, which are typically researchers pressured by publication deliverables and fund-raising burdens. They are thus not able to commit a large fraction of their time to software development.

An effort to allow code developers to make use of, and develop, shared components is needed. This requires an efficient coordination between various elements:

– A common and consistent code development infrastructure/education in terms of compilation, installation, testing and documentation.
– How to use and integrate already published libraries into existing projects.
– Creating long-lasting synergies between developers to reach a “critical mass” of component contributors.
– Relevant quality metrics (“TRLs” and “SRLs”), to provide businesses with useful information .

This is what the Electronic Structure Library (ESL)[esl, esl-gitlab] has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organizing workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative [exdci].






Module SCDM_WFs implements the selected columns of the density matrix (SCDM) method [1] for building localized Wannier Functions (WFs). Wannier90 [2] is a post-processing tool for the computation of the Maximally Localised Wannier Functions (MLWFs) [3,4,5], which have been increasingly adopted by the electronic structure community for different purposes. The reasons are manifold: MLWFs provide an insightful chemical analysis of the nature of bonding, and its evolution during, say, a chemical reaction. They play for solids a role similar to localized orbitals in molecular systems. In the condensed matter community, they are used in the construction of model Hamiltonians for, e.g., correlated-electron and magnetic systems. Also, they are pivotal in first-principles tight-binding Hamiltonians, where chemically-accurate Hamiltonians are constructed directly on the Wannier basis, rather than fitted or inferred from macroscopic considerations, and many other applications, e.g. dielectric response and polarization in materials, ballistic transport, analysis of phonons, photonic crystals, cold atom lattices, and the local dielectric responses of insulators, for reference see [3]. This module is a first step towards the automation of MLWFs. In the original Wannier90 framework, automation of MLWFs is hindered by the difficult step of choosing a set of initial localized functions with the correct symmetries and centers to use as an initial guess for the optimization. As a result, high throughput calculations (HTC) and big data analysis with MLWFs have proved to be problematic to implement.

This module is part of the newly developed Wannier90 utilities within the pilot project on Electronic Structure Functionalities for Multi-Thread Workflows. The module is part of the pw2wannier interface between the popular QUANTUM ESPRESSO code link and Wannier90. It will be part of the next version of QUANTUM ESPRESSO v.6.3 and Wannier90. Moreover, it has been successfully added in a developer branch of the AiiDA workflow [6] to perform HTC on large material datasets.

Practical application and exploitation of the code

The SCDM-k method [1] removes the need for an initial guess altogether by using information contained in the single-particle density matrix. In fact, the columns of the density matrix are localized in real space and can be used as a vocabulary to build the localized WFs. The SCDM-k method can be used in isolation to generate well localized WFs. More interestingly is the possibility of coupling the SCDM-k method to Wannier90. The core idea is to use WFs generated by the SCDM-k method as an initial guess in the optimization procedure within Wannier90. This module is a big step towards the automation of WFs and simplification of the use of the Wannier90 program. The module is therefore intended for all the scientists that benefit from the use of WFs in their research. Furthermore, by making the code more accessible and easier to use, this module will certainly increase the popularity of the Wannier90 code.

[1] A. Damle, L. Lin, L. Ying SCDM-k: Localized orbitals for solids via selected columns of the density matrix J.Comp.Phys. 334 (2017) 1
[2] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, N. Marzari wannier90: A tool for obtaining maximally-localised Wannier functions Com. Phys. Comm. 178 (2008) 685
[3] N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, D. Vanderbilt Maximally localized Wannier functions: Theory and applications Rev. Mod. Phys. 84 (2012) 1419
[4] N. Marzari, D. Vanderbilt Maximally localized generalized Wannier functions for composite energy bands Phys. Rev. B 56 (1997) 12847
[5] I. Souza, N. Marzari, D. Vanderbilt Maximally localized Wannier functions for entangled energy bands Phys. Rev. B 65 (2001) 035109
[6] G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, B. Kozinsky AiiDA: automated interactive infrastructure and database for computational science Comp. Mat. Sci. 111 (2016) 218


Symmetry Adapted Wannier Functions – a Component of the Wannier90


Symmetry Adapted Wannier Functions is a module within Wannier90 which is devoted to the construction of Wannier function (WF) with a given symmetry. The procedure implemented in this module enables one to control the symmetry and center of the WFs and also simplifies the minimisation of the spread functional under these symmetry constraints.

This module is part of the nine modules reported in Deliverable D2.3 which together deal with the implementation of symmetry adapted WFs, to improve the symmetery of the WFs and related electronic-structure quantities, such as band structure and density of states; improvements in the interpolation of band structures, developments in the selection of the k-point mesh to increase accuracy, ability of performing non-collinear spin calculations as well as interface layer modules to tight-binding codes.

Starting from an E-CAM ESDW3 in San Sebastian organised by the Wannier90 developers, a set of nine modules were produced to meet the desire of the electronic-structure community to extend the use of WFs, and in particular of Maximally Localised Wannier Functions (MLWFs), to a broader class of physical and chemical problems by adding new functionality to the Wannier90 code.

All modules are accessible through the Wannier90 code, which in turn is interfaced with the all the most popular DFT codes. Wannier90 is used as a postprocessing tool. Therefore, the end users of electronic-structure codes, such as DFT, Tight Binding and Quantum Monte Carlo codes, that are interfaced with these modules via Wannier90, will benefit from the functionalities they provide, e.g. WFs with improved symmetry, spin-orbit calculations etc., and they can focus on developing new ideas, and new science without needing to rewrite functionalities that are already established.

Practical application and exploitation of the code

Wannier functions are an important class of functions which enable one to obtain a real-space picture of the electronic structure of a system. They provide an insightful chemical analysis of the nature of bonding, and chemical reaction in condensed-matter physics, similar to the role played by localised molecular orbitals in chemistry. They are also a powerful tool in the study of dielectric properties via the modern theory of polarisation. In the condensed-matter community WFs are employed in the construction of model Hamiltonians for, e.g., correlated-electron and magnetic systems (to study new quantum phases of matter) and are used as building blocks in first-principles Tight Binding Hamiltonians, where chemically accurate Hamiltonians are constructed directly on the Wannier basis, rather than fitted or inferred from macroscopic considerations. [1]

Wannier90 [2] is a program that, for a given system, generates the Wannier functions with minimum spatial spreads, known as MLWFs, among the class of all possible WFs. The locality of MLWFs can be exploited to compute, among other things, band-structure, density of states and Fermi surfaces at modest computational cost.

The developed modules have been used to study the properties of strongly correlated materials and to assess the quality of high-level quantum methods. [3]


[1] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, N. Marzari wannier90: A tool for obtaining maximally-localised wannier functions Comput. Phys. Commun 178 (2008) 685

[2] N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, D. Vanderbilt Maximally localized wannier functions: Theory and applications Rev. Mod. Phys. 84 (2012) 1419

[3] L. Boehnke, F. Nilsson, F. Aryasetiawan, P. Werner When strong correlations become weak: Consistent merging of GW and DMFT Phys. Rev. B 94 (2016) 201106


New report published: Identification / Selection of E-CAM Electronic Structure Codes for Development


Read our latest report on the state of the art codes and methods in Quantum Monte Carlo, Density Functional Theory (DFT) and beyond DFT methods. This report contains a review of the software available in these areas and on the basic features that the majority of these codes have in common with a view to modularisation. Based on that, a list of software development projects to be developed by E-CAM is discussed.

Full report available here.


Scientific reports from the 2017 E-CAM workshops, are now available on our website


The scientific reports* from the following workshops conducted in year 2 of the project E-CAM (2017):

  1. E-CAM Scoping Workshop: “From the Atom to the Material” , 18- 20 September 2017, University of Cambridge, UK,
  2. E-CAM State-of-the-Art Workshop WP4: Meso and Multiscale Modelling, 29 May – 1 June 2017, University College Dublin, Ireland,

are now available for download on our website at this location. Furthermore, they will also integrate the CECAM Report of Activities 2017, published every year on the website

Each report includes:

  • an overview of the remit of the workshop,
  • the workshop program,
  • the list of attendees,
  • the major outcomes,
  • how these outcomes relate to community needs,
  • how the recommendation could be funded,
  • and how they relate to society and industry,
  • emphasis and impact on software development.


*© CECAM 2017, all rights reserved.

Please address any comments or questions to


Metal-ion force field developed by E-CAM using novel Machine Learning procedure is now available for download


The database of the force fields developed by the SNS SMART group (SNS, Pisa, Italy), including the metal-ions force fields optimized within E-CAM using novel Machine Learning procedure (reported in a recent publication[1] and in a case study reported by E-CAM here), are now available for download at

[1] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273 DOI: 10.1021/acs.jctc.7b00779



The simulation of metal ions in protein-water systems using machine learning: An E-CAM case study and conversation


With Dr. Francesco Fracchia, Scuola Normale Superiore di Pisa

Interviewer: Dr. Donal Mackernan, University College Dublin



One quarter to one third of all proteins require metals to function but the description of metal ions in standard force fields is still quite primitive. In this case study and interview an E-CAM project to develop a suitable parameterisation using machine learning is described.  The training scheme combines classical simulation with electronic structure calculations to produce a force field comprising standard classical force fields with additional terms for the metal ion-water and metal ion-protein interactions. The approach allows simulations to run as fast as standard molecular dynamics codes, and is suitable for efficient massive parallelism scale-up.

Continue reading…


GRASP Sampling – a module to build a representative data set for a fitting procedure

GRASP_sampling performs a stratified sampling of the configurations, described by vectors, of a system to build a representative training set in a fitting procedure. Given a list of candidate configurations, and selected the size (N) of the training set required, the module executes the combinatorial optimization that maximizes the following dissimilarity score (DS) among the elements of the training set:


In this formula, the j-th configuration in the sum is the j-th nearest one to the l-th configuration and dij is the Euclidean distance between the l-th and j-th configurations. M is the number of the nearest configurations considered in the score. The exponential weight makes the score near independent from the particular value of M, if it is larger than 4-6.

The combinatorial optimization that maximizes the dissimilarity score is performed using the greedy randomized adaptive search procedure[1]  (GRASP) algorithm. A stratified sampling can be performed without a combinatorial optimization using classical statistical techniques (for example Latin hypercube sampling), the GRASP sampling becomes useful when the selection is restricted to a predeterminated set of configurations, generated or sampled with specific internal constrains. This is the case of the molecular configurations generated in a molecular dynamics simulation.

The complete module documentation, including a link to the source code, can be found in our repository here

Motivation and exploitation

The application of the GRASP algorithm to perform a stratified sampling is described in a recent publication [2] by the E-CAM partners at Scuola Normale Superiore (SNS), that we previously reported here.

The motivation behind this software module is the pilot project with industry “Quantum Mechanical Parameterisation of Metal Ions in Proteins” sustained by an E-CAM postdoctoral researcher from SNS.


[1] Feo, T. A.; Resende, M. G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109−133

[2] Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone, Force Field Parametrization of Metal Ions from Statistical Learning Techniques, J. Chem. Theory Comput. 2018, 14, 255−273


New article is out: “Force Field Parametrization of Metal Ions from Statistical Learning Techniques”


This paper from E-CAM partners working in Scuola Normale Superiore (Pisa, Italy) describes a novel statistical procedure, developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The paper is open access and can be downloaded directly from ACS’s page at

This work was performed in the context of the E-CAM pilot project on Quantum Mechanical Parameterisation of Metal Ions in Proteins, which is a collaboration with BiKi Technologies. The list of software modules associated to the pilot project (and this publication) can be found here.


Title: Force Field Parametrization of Metal Ions from Statistical Learning Techniques

Authors: Francesco Fracchia, Gianluca Del Frate, Giordano Mancini, Walter Rocchia, and Vincenzo Barone

Abstract: A novel statistical procedure has been developed to optimize the parameters of non-bonded force fields of metal ions in soft matter. The criterion for the optimization is the minimization of the deviations from ab initio forces and energies calculated for model systems. The method exploits the combination of the linear ridge regression and the cross-validation techniques with the di˙erential evolution algorithm. Wide freedom in the choice of the functional form of the force fields is allowed since both linear and non-linear parameters can be optimized. In order to maximize the information content of the data employed in the fitting procedure, the composition of the training set is entrusted to a combinatorial optimization algorithm which maximizes the dissimilarity of the included instances. The methodology has been validated using the force field parametrization of five metal ions (Zn2+, Ni2+, Mg2+, Ca2+, and Na+) in water as test cases.



Geomoltools: A set of software modules to easily manipulate molecular geometries

Geomoltools is a set of eight pre- and post-treatment Fortran codes that can be used to easily manipulate molecular geometries, allowing to minimize the average energy obtained for a range of internuclear distances for the dimers of each element, and decrease the computational cost of a DFT calculation.

The set of codes are:

  • mol2xyz: converts a .mol file into an ordered .xyz file
  • pastemol: joins two .xyz files
  • movemol: translates and aligns the molecule with some predefined axes
  • stackmol: generates (manually or randomly) different stacking arrangements between two molecules
  • geodiff: compares the internal coordinates of two molecules
  • xyz2zmt_s: converts the cartesian coordinates contained in a .xyz file into Z-matrix (2 possible formats)
  • zmt2xyz_s: converts a Z-matrix (from 2 possible formats) into cartesian coordinates
  • ucubcellgen: calculates the vectors of a unit cell given some atomic coordinates.

Modules source codes can be found here.  For a detailed explanation of the main programs, please have a look to this file. A complete tutorial on how to use the different codes from the package Geomoltools in order to manipulate (rotate, translate, join, pack, convert, etc.) molecular geometries, can be found at this address.

Motivation and exploitation

These modules have been used to study the stacking arrangements of acceptor:donor molecules for organic photovolatics polymers by high-throughput computation with the SIESTA code. This set of codes are available under the GNU General Public License (GPL) version 2.