E-CAM High Throughput Computing Library

This module is the first in a sequence that will form the overall capabilities of the E-CAM High Throughout Computing (HTC) library. In particular this module deals with creating a set of decorators to wrap around the Dask-Jobqueue Python library, which aspires to make the development time cost of leveraging it lower for our use cases.

The initial motivation for this library is driven by the ensemble-type calculations that are required in many scientific fields, and in particular in the materials science domain in which the E-CAM Centre of Excellence operates.

One specific application for this module is the study of “rare events” in theoretical and computational chemistry, a particularly relevant topic for E-CAM . Many problems in biological chemistry, materials science, and other fields involve events that only spontaneously occur after a millisecond or longer (for example, biomolecular conformational changes, or nucleation processes). That means that around 1012 time steps would be needed to see a single millisecond-scale event.

Modern supercomputers are beginning to make it possible to obtain trajectories long enough to observe some of these processes, but to fully characterize a transition with proper statistics, many examples are needed. In order to obtain many examples the same application must be run many thousands of times with varying inputs. To manage this kind of computation a task scheduling high throughput computing (HTC) library is needed. The main elements of the mentioned scheduling library are: task definition, task scheduling and task execution.

While traditionally an HTC workload is looked down upon in the HPC space, the scientific use case for extreme-scale resources exists and algorithms that require a coordinated approach make efficient libraries that implement this approach increasingly important in the HPC space. The 5 Petaflop booster technology of JURECA is an interesting concept with respect to this approach since the offloading approach of heavy computation marries perfectly to the concept outlined here.

Module documentation at https://e-cam.readthedocs.io/en/latest/Classical-MD-Modules/modules/HTC/decorators/readme.html

Share

The CECAM Electronic Structure Library and the modular software development paradigm

E-CAM is working closely with the Electronic Structure Library (ESL) initiative since some years now. A review of the CECAM ESL is now out and can be accessed at  https://arxiv.org/abs/2005.05756. The abstract is below.

Abstract

First-principles electronic structure calculations are very widely used thanks to the many successful software packages available. Their traditional coding paradigm is monolithic, i.e., regardless of how modular its internal structure may be, the code is built independently from others, from the compiler up, with the exception of linear-algebra and message-passing libraries. This model has been quite successful for decades. The rapid progress in methodology, however, has resulted in an ever increasing complexity of those programs, which implies a growing amount of replication in coding and in the recurrent re-engineering needed to adapt to evolving hardware architecture. The Electronic Structure Library (ESL) was initiated by CECAM to catalyze a paradigm shift away from the monolithic model and promote modularization, with the ambition to extract common tasks from electronic structure programs and redesign them as free, open-source libraries. They include “heavy-duty” ones with a high degree of parallelisation, and potential for adaptation to novel hardware within them, thereby separating the sophisticated computer science aspects of performance optimization and re-engineering from the computational science done by scientists when implementing new ideas. It is a community effort, undertaken by developers of various successful codes, now facing the challenges arising in the new model. This modular paradigm will improve overall coding efficiency and enable specialists (computer scientists or computational scientists) to use their skills more effectively. It will lead to a more sustainable and dynamic evolution of software as well as lower barriers to entry for new developers.

Share

Issue 13 – April 2020

E-CAM Newsletter of April 2020

 

Get the latest news from E-CAM, sign up for our  newsletter.

Share

E-CAM interview with Massimo Noro, Director of Business Development at STFC

In 2019, Massimo Noro was invited by the CECAM Headquarters at EPFL (E-CAM coordinator) to give a lecture in the framework of the CECAM/MARVEL Mary Ann Mansigh Conversation Series, entitled “Computer modelling for industrial applications”. E-CAM interviewed Massimo Noro at that occasion.

Particularly active in applying atomistic and coarse-grained simulations to study the interaction of nano-objects and surfactants with lipid bilayers for industrial applications (e.g. soaps, detergents, etc.), Massimo Noro has made considerable contributions to the development and application of the Dissipative Particle Dynamics (DPD) simulation technique to study soft condensed matter systems.

Former science leader of the High Performance Computing division at Unilever and current Director of Business Development at the Science and Technology Facilities Council (STFC), with a focus on the Daresbury Campus (see short bio below). Massimo is also a member of E-CAM’s Executive Board. In this interview, he will talk about his journey from academic research, to work in Unilever and now at STFC, and will share his insights on the use of simulation and modelling in industry and the role of STFC and research in this regard.

Watch Massimo Noro’s reply to three key questions of this interview:

Tell us about your journey from academic research, to work in Unilever and now at STFC

What are the key ingredients for the successful relationship between STFC and Industry

What do you think are the most important HPC needs for industry

Full video interview is available here, with the following outline:

Minute:Second (with direct link to the video)Q&A#Question
00:211Tell us about your journey from academic research, to work in Unilever and now at STFC
02:192Could you tell us about STFC and in particular its activities related to simulation
04:503What are the key ingredients for the successful relationship between STFC and Industry
08:134Can you give us an example of how simulation solved an industrial problem
09:265What do you think are the most important HPC needs for industry
12:186Do you think machine learning approaches combined with experiment will impact industrial R&D
14:057What is the role played by research software engineers
15:208What do you think are the barriers to enter an industry job
17:519What is the importance of open innovation in industrial R&D
20:0410What is the importance of diversity on the work space

Massimo Noro

Massimo Noro is the Director of Business Development at the Science & Technology Facilities Council (STFC), with a focus on the Daresbury Campus. His role is to ensure the continued growth and success of the Daresbury Laboratory at the Sci-Tech Daresbury Campus.

Massimo joined STFC in February 2018, following a successful industrial R&D career at Unilever with a proven track record as program and people leader in a corporate environment – Unilever is a large multinational and a market leader in home care, personal care, refreshments and foods products. He gained considerable experience in managing high-budget projects and in leading teams across sites and across complex organisations. Massimo leads on strategic partnerships with industry and local government; he manages a wide team to deliver innovation, to develop strong pipelines of commercial engagements and to provide a range of offerings for business incubation.

Share

Some useful tips to help moving to online training

E-CAM has built up a collection of (hopefully) useful information to help our community, other Centres of Excellence, and interested groups, transition to online training. The information originates from community-contributed sources and by directly sharing our experience in capturing and broadcasting E-CAM training events. Guides to help with online training are being rapidly created as the CoVid-19 crises evolves, and we try to keep the information here moderated to avoid overwhelming people.

This collection, “Moving to online learning”, is available through E-CAM’s ONLINE TRAINING PORTAL and includes the following items:

If you know of something that could be of value in this list, please email E-CAM Software Manager Alan O’Cais at a.ocais@fz-juelich.de.

Share

Automated high-throughput Wannierisation, a successful collaboration between E-CAM and the MaX Centre of Excellence

Maximally-localised Wannier functions (MLWFs) are routinely used to compute from first- principles advanced materials properties that require very dense Brillouin zone (BZ) integration and to build accurate tight-binding models for scale-bridging simulations. At the same time, high-thoughput (HT) computational materials design is an emergent field that promises to accelerate the reliable and cost-effective design and optimisation of new materials with target properties. The use of MLWFs in HT workflows has been hampered by the fact that generating MLWFs automatically and robustly without any user intervention and for arbitrary materials is, in general, very challenging. We address this problem directly by proposing a procedure for automatically generating MLWFs for HT frameworks. Our approach is based on the selected columns of the density matrix method (SCDM, see SCDM Wannier Functions) and is implemented in an AiiDA workflow.

Purpose of the module

Create a fully-automated protocol based on the SCDM algorithm for the construction of MLWFs, in which the two free parameters are determined automatically (in our HT approach the dimensionality of the disentangled space is fixed by the total number of states used to generate the pseudopotentials in the DFT calculations).

A paper describing the work is available at https://arxiv.org/abs/1909.00433, where this approach was applied to a dataset of 200 bulk crystalline materials that span a wide structural and chemical space.

Background information

This module is a collaboration between E-CAM and the MaX Centre of Excellence.

In the SCDM Wannier Functions module, E-CAM has implemented the SCDM algorithm in the pw2wannier90.f90 interface code between the Quantum ESPRESSO software and the Wannier90 code. This implementation was used as the basis for a complete computational workflow for obtaining MLWFs and electronic properties based on Wannier interpolation of the BZ, starting only from the specification of the initial crystal structure. The workflow was implemented within the AiiDA materials informatics platform, and used to perform a HT study on a dataset of 200 materials, as described in here.

More information at https://e-cam.readthedocs.io/en/latest/Electronic-Structure-Modules/modules/W90_MaX_collab/readme.html

Share

Protein based biosensors: application in detecting influenza

Donal MacKernan, University College Dublin & E-CAM

An E-CAM transverse action is the development of a protein based sensor (pending patent filled in by UCD[1,2]) with applications in medical diagnostics, scientific visualisation and therapeutics. At the heart of the sensor is a novel protein based molecular switch which allows extremely sensitive real time measurement of molecular targets to be made, and to turn on or off  protein functions and other processes accordingly (see Figure 1). For a description of the sensor, see this piece

One of the applications of the protein based sensor can be to detect influenza, by modifying the sensor to measure ‘up regulated Epidermal growth factor receptor’ (EGFR) in living cells. The interest of using it for the flu, is that it is cheap, easy to use in the field by non-specialists, and accurate – that is with very low false negatives and positives compared to existing field tests. UCD’s patent pending sensors have these attributes built into their ‘all-n-one’ design, through a novel type of molecular switch, that thrived in the laboratory proof of concept phase. A funded research project to continue this development at UCD is almost certain, and likely to start within weeks.

And the answer to the current frequently asked question “can we modify this sensor to quickly detect the COVID 19 ?” is yes, provided we know amino acid sequences of antibody -epitope pairs specific to this coronavirus.

Figure 1. Schematic illustration of a widely used sensor on the left of Komatsu et al[3] and the “all-n-one” UCD sensor on the right in the “OFF” and “ON” states corresponding to the absence and presence of the target biomarker respectively. The “all-n-one” substitutes the Komatsu flexible linker with a hinge protein with charged residues q1,q2,..which are symmetrically placed on either side of the centre so as to ensure that in the absence of the target, the Coulomb repulsion forces the hinge to be open. Their location and number can be adjusted to suit each application. The spheres B and B’ denote the sensing modules which tend to bind to each other when a target biomarker or analyte is present. The spheres A and A’ denote the reporting modules which emit a recognisable (typically optical) signal when they are close or in contact with each other i.e. in the presence of a target biomarker or analyte.

[1] EP3265812A2, 2018-01-10, UNIV. COLLEGE DUBLIN NAT. UNIV. IRELAND. Inventors: Donal MacKernan and Shorujya Sanyal. Earliest priority: 2015-03-04, Earliest publication: 2016-09-09. https://worldwide.espacenet.com/patent/search?q=pn%3DEP3265812A2  

[2] WO2018047110A1, 2018-03-15, UNIV. COLLEGE DUBLIN NAT. UNIV. IRELAND. Inventor: Donal MacKernan. Earliest priority: 2016-09-08, Earliest publication: 2018-03-15. https://worldwide.espacenet.com/patent/search?q=pn%3DWO2018047110A1

[3] Komatsu N., Aoki K., Yamada M., Yukinaga H., Fujita Y., Kamioka Y., Matsuda M., Development of an optimized backbone of FRET biosensors for kinases and GTPases. Mol. Biol. Cell. 2011 Dec; 22(23): 4647-56.

Share

QMCPack Interfaces for Electronic Structure Computations

Quantum Monte Carlo (QMC) methods are a class of ab initio, stochastic techniques for the study of quantum systems. While QMC simulations are computationally expensive, they have the advantage of being accurate, fully ab initio and scalable to a large number of cores with limited memory requirements.

These features make QMC methods a valuable tool to assess the accuracy of DFT computations, which are widely used in the fields of condensed matter physics, quantum chemistry and material science.

QMCPack is a free package for QMC simulations of electronic structure developed in several national labs in the US. This package is written in object oriented C++, offers a great flexibility in the choice of systems, trial wave functions and QMC methods and supports massive parallelism and the usage of GPUs.

Trial wave functions for electronic QMC computations commonly require the use of  single electrons orbitals, typically computed by DFT. The aim of the E-CAM pilot project described here is to build interfaces between QMCPack and other softwares for electronic structure computations, e.g. the DFT code Quantum Espresso.

These interfaces are used to manage the orbital reading or their DFT generation within QMCPack, to establish an automated, black box workflow for QMC computations. QMC simulation can for example be used in the benchmark and validation of DFT calculations: such a procedure can be employed in the study of several physical systems of interest in condensed matter physics, chemistry or material science, with application in the industry, e.g. in the study of metal-ion or water-carbon interfaces.

The following modules have been built as part of this pilot project:

  • QMCQEPack, that provides the files to download and  properly patch Quantum Espresso 5.3 to build the libpwinterface.so library; this library is required to use the module ESPWSCFInterface to generate single particle orbitals during a QMCPack computation using Quantum Espresso.
  • ESInterfaceBase that provides a base class for a general interface to generate single particle orbitals to be used in QMC simulations in QMCPack; implementations of specific interfaces as derived classes of ESInterfaceBase are available as the separate modules as follows:

The documentation about interfaces in QMCPack, can be seen in the QMCPack user manual at https://github.com/michruggeri/qmcpack/blob/f88a419ad1a24c68b2fdc345ad141e05ed0ab178/manual/interfaces.tex

Share

New publication is out: “Towards extreme scale dissipative particle dynamics simulations using multiple GPGPUs”

 

E-CAM researchers working at the Hartree Centre – Daresbury Laboratory have co-designed the DL_MESO Mesoscale Simulation package to run on multiple GPUs, and ran for the first time a Dissipative Particle Dynamics simulation of a very large system (1.8 billion particles) on 4096 GPUs.

 

Towards extreme scale dissipative particle dynamics simulations using multiple GPGPUs
J. Castagna, X. Guo, M. Seaton and A. O’Cais
Computer Physics Communications (2020) 107159
DOI: 10.1016/j.cpc.2020.107159 (open access)

Abstract

A multi-GPGPU development for Mesoscale Simulations using the Dissipative Particle Dynamics method is presented. This distributed GPU acceleration development is an extension of the DL_MESO package to MPI+CUDA in order to exploit the computational power of the latest NVIDIA cards on hybrid CPU–GPU architectures. Details about the extensively applicable algorithm implementation and memory coalescing data structures are presented. The key algorithms’ optimizations for the nearest-neighbour list searching of particle pairs for short range forces, exchange of data and overlapping between computation and communications are also given. We have have carried out strong and weak scaling performance analyses with up to 4096 GPUs. A two phase mixture separation test case with 1.8 billion particles has been run on the Piz Daint supercomputer from the Swiss National Supercomputer Center. With CUDA aware MPI, proper GPU affinity, communication and computation overlap optimizations for multi-GPU version, the final optimization results demonstrated more than 94% efficiency for weak scaling and more than 80% efficiency for strong scaling. As far as we know, this is the first report in the literature of DPD simulations being run on this large number of GPUs. The remaining challenges and future work are also discussed at the end of the paper.

Share

6 software modules delivered in the area of Quantum Dynamics

 

In this report for Deliverable 3.5 of E-CAM [1], 6 software modules in quantum dynamics are presented.

All modules stem from the activities initiated during the State-of-the-Art Workshop held at Lyon (France) in June 2019 and the Extended Software Development Workshop in Quantum Dynamics, held at Durham University (UK) in July 2019. The modules originate from the input of E-CAM’s academic user base. They have been developed by members of the project (S. Bonella – EPFL), established collaborators (G. Worth – University College London, S. Gomez – University of Vienna, C. Sanz – University of Madrid, D. Lauvergnat – Univeristy of Paris Sud) and new contributors to the E-CAM repository (F. Agostini – University of Paris Sud, Basile Curchod – University of Durham, A. Schild – ETH Zurich, S. Hupper and T. Plé – Sorbonne University, G. Christopoulou – University College London). The presence of new contributors indicates the interest of the community in our efforts. Furthermore, the contributors to modules in WP3 continue to be at different stages of their careers (in particular, Thomas Plé and G. Christopoulou are PhD students) highlighting the training value of our activities.

Following the order of presentation, the 6 modules are named: CLstunftiPIM_QTBPerGaussDirect Dynamics DatabaseExact Factorization Analysis Code (EFAC), and GuessSOC. In this report, a short description is written for each module, followed by a link to the respective Merge-Request document on the GitLab service of E-CAM. These merge requests contain detailed information about the code development, testing and documentation of the modules. 

[1] “D3.5.: Quantum dynamics e-cam modules IV,” Dec. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3598325

Full report available here.

 

Share