Accelerating the design and discovery of materials with tailored properties using first principles high-throughput calculations and automated generation of Wannier functions


A successful collaboration between the EU H2020 E-CAM and MaX Centres of Excellence, and the Swiss NCCR MARVEL


In a recent paper[1], researchers from the Centres of Excellence E-CAM[2] and MaX[3], and the centre for Computational Design and Discovery of Novel Materials NCCR MARVEL[4], have proposed a new procedure for automatically generating Maximally-Localised Wannier functions (MLWFs) for high-throughput frameworks. The methodology and associated software  can be used for hitherto difficult cases of entangled bands, and allows the  electronic properties of a wide variety of materials to be obtained starting only from the specification of the initial crystal structure, including insulators, semiconductors and metals. Industrial applications that this work will facilitate include the development of novel superconductors, multiferroics, topological insulators, as well as more traditional electronic applications.

Graphical representation of all data and calculations run in the project and their interconnections (provenance), as tracked automatically by AiiDA in the form of a directed acyclic graph (image credits: G. Pizzi)


Predicting the properties of complex materials generally entails the use of methods that facilitate coarse grained perspectives more suitable for large scale modelling, and  ultimately device design and manufacture. When a quantum level of description of a modular-like system  is required, this can often be facilitated by expressing the Hamiltonian in terms of a localised, real-space basis set, enabling it to be partitioned without ambiguity into sub-matrices that correspond to the individual subsystems. Maximally-localised Wannier functions  (MLWFs) are particularly suitable in this context. However, until now generating MLWFs has been difficult to exploit  in high-throughput design of materials, without  the specification by users of a set of initial guesses for the MLWFs,  typically trial functions localised in real space, based on their experience and chemical intuition. 


E-CAM[2] scientist Valerio Vitale and co-authors from the partner H2020 Centre of Excellence  MAX[3] and the Swiss based NCCR MARVEL [4] in a recent article[1] look afresh at this problem in the context of an algorithm by Damle et al[5], known as the selected columns of the density matrix (SCDM) method, as a method to provide automatically initial guesses for the MLWF search, to compute a set of localized orbitals associated with the Kohn–Sham subspace for insulating systems. This has shown great promise in avoiding the need for user intervention in obtaining MLWFs and is robust, being based on standard linear-algebra routines rather than on iterative minimisation. In particular, Vitale et al. developed a fully-automated protocol based on the SCDM algorithm in which the three remaining free parameters (two from the SCDM method, plus the choice of the target dimensionality for the disentangled subspace) are determined automatically, making it thus parameter-free even in the case of entangled bands. The work systematically compares the accuracy and ease of use of standard methods to generate localised basis sets  as (a) MLWFs; (b)  MLWFs combined with SCDM’s and (c) using solely SCDM’s;  and applies this multifaceted perspective to hundreds of materials including insulators, semiconductors and metals.

Comparison between Wannier-interpolated valence bands (red lines) and the full direct-DFT band structure (black lines), for 150 different materials. The direct and interpolated band structures are essentially indistinguishable (image credits: G. Pizzi)


This is significant because it greatly expands the scope of materials for which MLWFs can be generated in high throughput studies and has the potential to accelerate the design and discovery of materials with tailored properties using first-principles high-throughput (HT) calculations, and facilitate advanced industrial applications. Industrial applications that this work will facilitate include the development of novel superconductors, multiferroics, topological insulators, as well as more traditional electronic applications.

Background information

This module is a collaboration between the E-CAM and MaX HPC centres of excellence, and the NCCR MARVEL

In SCDM Wannier Functions, E-CAM has implemented the SCDM algorithm in the pw2wannier90 interface code between the Quantum ESPRESSO software and the Wannier90 code. This was done in the context of an E-CAM pilot project at the University of Cambridge. Researchers have then used this implementation as the basis for a complete computational workflow for obtaining MLWFs and electronic properties based on Wannier interpolation of the Brillouin zone, starting only from the specification of the initial crystal structure. The workflow was implemented within the AiiDA materials informatics platform (from the NCCR MARVEL and the MaX CoE) , and used to perform a HT study on a dataset of 200 materials.

Source Code

See the Materials Cloud Archive entry. A downloadable virtual machine is provided that allows to reproduce the results of the associated paper and also to run new calculations for different materials, including all first-principles and atomistic simulations and the computational workflows.


[1] Automated high-throughput Wannierisation, Valerio Vitale, Giovanni Pizzi, Antimo Marrazzo, Jonathan R. Yates, Nicola Marzari and Arash A. Mostofi, Nature Computational Materials (2020)6:66 ;




[5] Compressed Representation of Kohn−Sham Orbitals via Selected Columns of the Density Matrix , Anil Damle, Lin Lin,  and Lexing Ying, J. Chem. Theory Comput. 2015, 11, 1463−1469


Issue 13 – April 2020

E-CAM Newsletter of April 2020


Get the latest news from E-CAM, sign up for our  newsletter.


E-CAM interview with Massimo Noro, Director of Business Development at STFC

In 2019, Massimo Noro was invited by the CECAM Headquarters at EPFL (E-CAM coordinator) to give a lecture in the framework of the CECAM/MARVEL Mary Ann Mansigh Conversation Series, entitled “Computer modelling for industrial applications”. E-CAM interviewed Massimo Noro at that occasion.

Particularly active in applying atomistic and coarse-grained simulations to study the interaction of nano-objects and surfactants with lipid bilayers for industrial applications (e.g. soaps, detergents, etc.), Massimo Noro has made considerable contributions to the development and application of the Dissipative Particle Dynamics (DPD) simulation technique to study soft condensed matter systems.

Former science leader of the High Performance Computing division at Unilever and current Director of Business Development at the Science and Technology Facilities Council (STFC), with a focus on the Daresbury Campus (see short bio below). Massimo is also a member of E-CAM’s Executive Board. In this interview, he will talk about his journey from academic research, to work in Unilever and now at STFC, and will share his insights on the use of simulation and modelling in industry and the role of STFC and research in this regard.

Watch Massimo Noro’s reply to three key questions of this interview:

Tell us about your journey from academic research, to work in Unilever and now at STFC

What are the key ingredients for the successful relationship between STFC and Industry

What do you think are the most important HPC needs for industry

Full video interview is available here, with the following outline:

Minute:Second (with direct link to the video)Q&A#Question
00:211Tell us about your journey from academic research, to work in Unilever and now at STFC
02:192Could you tell us about STFC and in particular its activities related to simulation
04:503What are the key ingredients for the successful relationship between STFC and Industry
08:134Can you give us an example of how simulation solved an industrial problem
09:265What do you think are the most important HPC needs for industry
12:186Do you think machine learning approaches combined with experiment will impact industrial R&D
14:057What is the role played by research software engineers
15:208What do you think are the barriers to enter an industry job
17:519What is the importance of open innovation in industrial R&D
20:0410What is the importance of diversity on the work space

Massimo Noro

Massimo Noro is the Director of Business Development at the Science & Technology Facilities Council (STFC), with a focus on the Daresbury Campus. His role is to ensure the continued growth and success of the Daresbury Laboratory at the Sci-Tech Daresbury Campus.

Massimo joined STFC in February 2018, following a successful industrial R&D career at Unilever with a proven track record as program and people leader in a corporate environment – Unilever is a large multinational and a market leader in home care, personal care, refreshments and foods products. He gained considerable experience in managing high-budget projects and in leading teams across sites and across complex organisations. Massimo leads on strategic partnerships with industry and local government; he manages a wide team to deliver innovation, to develop strong pipelines of commercial engagements and to provide a range of offerings for business incubation.


Protein based biosensors: application in detecting influenza

Donal MacKernan, University College Dublin & E-CAM

An E-CAM transverse action is the development of a protein based sensor (pending patent filled in by UCD[1,2]) with applications in medical diagnostics, scientific visualisation and therapeutics. At the heart of the sensor is a novel protein based molecular switch which allows extremely sensitive real time measurement of molecular targets to be made, and to turn on or off  protein functions and other processes accordingly (see Figure 1). For a description of the sensor, see this piece

One of the applications of the protein based sensor can be to detect influenza, by modifying the sensor to measure ‘up regulated Epidermal growth factor receptor’ (EGFR) in living cells. The interest of using it for the flu, is that it is cheap, easy to use in the field by non-specialists, and accurate – that is with very low false negatives and positives compared to existing field tests. UCD’s patent pending sensors have these attributes built into their ‘all-n-one’ design, through a novel type of molecular switch, that thrived in the laboratory proof of concept phase. A funded research project to continue this development at UCD is almost certain, and likely to start within weeks.

And the answer to the current frequently asked question “can we modify this sensor to quickly detect the COVID 19 ?” is yes, provided we know amino acid sequences of antibody -epitope pairs specific to this coronavirus.

Figure 1. Schematic illustration of a widely used sensor on the left of Komatsu et al[3] and the “all-n-one” UCD sensor on the right in the “OFF” and “ON” states corresponding to the absence and presence of the target biomarker respectively. The “all-n-one” substitutes the Komatsu flexible linker with a hinge protein with charged residues q1,q2,..which are symmetrically placed on either side of the centre so as to ensure that in the absence of the target, the Coulomb repulsion forces the hinge to be open. Their location and number can be adjusted to suit each application. The spheres B and B’ denote the sensing modules which tend to bind to each other when a target biomarker or analyte is present. The spheres A and A’ denote the reporting modules which emit a recognisable (typically optical) signal when they are close or in contact with each other i.e. in the presence of a target biomarker or analyte.

[1] EP3265812A2, 2018-01-10, UNIV. COLLEGE DUBLIN NAT. UNIV. IRELAND. Inventors: Donal MacKernan and Shorujya Sanyal. Earliest priority: 2015-03-04, Earliest publication: 2016-09-09.  

[2] WO2018047110A1, 2018-03-15, UNIV. COLLEGE DUBLIN NAT. UNIV. IRELAND. Inventor: Donal MacKernan. Earliest priority: 2016-09-08, Earliest publication: 2018-03-15.

[3] Komatsu N., Aoki K., Yamada M., Yukinaga H., Fujita Y., Kamioka Y., Matsuda M., Development of an optimized backbone of FRET biosensors for kinases and GTPases. Mol. Biol. Cell. 2011 Dec; 22(23): 4647-56.


E-CAM Case Study: The development of the GC-AdResS scheme:

from smooth coupling

to a direct interface (abrupt)

Dr. Christian Krekeler, Freie Universität Berlin


GC-AdResS is a technique  that speeds up computations without loss of accuracy for key system properties by dividing the simulation box into two or more regions having different levels of resolution, for instance a high resolution region where the molecules of the system are treated at an atomistic level of detail, and other regions where molecules are treated at a coarse grained level, and transition regions where a weighted average of the two resolutions is used. The goal of the E-CAM GC-AdResS pilot project was to eliminate  the need of a transition region so as to significantly improve  performance, and to allow much greater flexibility. For example, the  low resolution region can be a particle reservoir (ranging in detail from coarse grained  to ideal gas particles) and a high resolution atomistic region with no transition region, as was needed hitherto.  The only requirement is that the two regions can exchange particles, and that a corresponding “thermodynamic” force is computed self-consistently, which it turns out is very simple to implement.

Continue reading…

A Conversation on The Fourth Industrial Revolution: Opportunities & Trends for Particle Based Simulation



In the margins of a recent multiscale simulation workshop a discussion began between a prominent  pharmaceutical industry scientist, and E-CAM and EMMC regarding the unfolding Fourth Industrial Revolution and the role of particle based simulation and statistical methods there.  The impact of simulation  is predicted to become very significant.  This discussion is intended to create awareness of the general public, of how industry 4.0 is initiating in companies, and  how academic research will support that transformation.

Authors: Prof. Pietro Asinari (EMMC and Politecnico di Torino, denoted below as PA) and Dr. Donal MacKernan (E-CAM and University College Dublin, denoted below as  DM) , and a prominent  pharmaceutical industry scientist (name withheld at author’s request as  the view expressed is a personal one, denoted below as  IS)

Continue reading…

Mesoscale simulation of billion atom complex systems using thousands of GPGPU’s, an industry success story

Dr. Jony Castagna, Science and Technology Facilities Council, United Kingdom


Jony Castagna recounts his transition from industry scientist to research software developer at the STFC, his E-CAM rewrite of  DL_MESO allowing the simulation of billion atom systems on thousands of GPGPUs, and his latest role as Nvidia ambassador focused on machine learning.

Continue reading…

Software vendor SMEs as a boost for technology transfer in industrial simulative pipelines

The E-CAM Scoping Workshop “Building the bridge between theories and software: SME as a boost for technology transfer in industrial simulative pipelines”, organised in May 2018 at the Fondazione Istituto Italiano di Tecnologia (IIT), Genoa, brought together top-level scientists of the E-CAM community with expertise in statistical mechanics, multi-scale modeling and electronic structure, and representatives of pharmaceutical and material industries, with the objective to identify the major gaps which still hamper a systematic exploitation of accurate computer simulations in industrial R&D. Special attention was given to the role of SMEs devoted to simulative software development, and several software vendor SMEs were present at the meeting.

The meeting highlighted the role of software vendor SMEs as a key link for the uptake of modelling in industry. They can play an increasingly important role not only in translating the science developed in academia into a proper technological transfer process, but also in building a scientific bridge between the industry requirements in terms of automation and the new theories and algorithms developed at an academic level. There was also a consensus that EU funded Centers of Excellence for Computing Applications, such as E-CAM, can provide an opportunity to enhance the expertise and scope of software vendors SMEs.

Read the full report here.


E-CAM Case Study: Mesoscale models for polarisable solvents: application to oil-water interfaces

Dr. Silvia Chiacchiera, Science and Technology Facilities Council, United Kingdom


Water is a polar liquid and has a dielectric permittivity much higher than typical apolar liquids, such as light oils. This strong dielectric contrast at water-oil interfaces affects electrostatics and is important, for example, to include these effects to describe biomolecular processes and water-oil mixtures involving surfactants, as detergents. In this pilot project, developed in collaboration with Unilever and Manchester University, we have proposed and analysed a class of polarisable solvent models to be used in Dissipative Particle Dynamics (DPD), a coarse-grained particle-based simulation method commonly used in various industrial sectors. Related software modules for the DL_MESO package have also been developed.

Continue reading…


The Curse of Dimensionality in Data-Intensive Modeling in Medicine, Biology, and Diagnostics

With Prof. Tim Conrad (TC), Free University of Berlin, and Dr. Donal Mackernan (DM), University College Dublin.


Until recently the idea that methods rooted in statistical physics could be used to elucidate phenomena and underlying mechanisms in biology and medicine was widely considered to be a distant dream.  Elements of that dream are beginning to be realized, aided very considerably by machine learning and advances in measurement, exemplified by the development of large-scale biomedical data analysis for next-generation diagnostics. In this E-CAM interview of Tim Conrad,  the growing importance of diagnostics in medicine and biology is discussed. One difficulty faced by such developments and shared with particle-based simulation is the “curse of dimensionality”. It is manifest in problems such as: (a) the use of a very large number of order parameters when trying to identify reaction mechanisms, nucleation pathways, metastable states, reaction rates; polymorph recognition (b) machine learning  applied to electronic structure  problems – such as neural network based potentials need very high dimensional basis sets; (c) systematic coarse-graining would ideally start with a very high dimensional space and systematically reduce the dimension.  The opportunities and challenges for scientists engaging with industry are also discussed. Tim Conrad is Professor of “Medical Bioinformatics” at the Institute of Mathematics of the Free University of Berlin and head of MedLab, one of the four laboratories of the Modal research campus. MODAL is a public-private partnership project which conducts mathematical research on data-intensive modeling, simulation, and optimization of complex processes in the fields of energy, health, mobility, and communication.  Tim Conrad is also the founder of three successful start-up companies.

In this E-CAM interview with Prof. Tim Conrad, the growing importance of diagnostics in medicine and biology is discussed, including concepts rooted in signal analysis relevant to systematic dimensional reduction, and pattern recognition, and the possibilities of their application to systematic coarse-graining. The opportunities and challenges for scientists of start-up companies are also discussed based on experience.


Continue reading…