The module minDist2segments_KKT returns the minimal distance between two line segments. It uses the Karush-Kuhn-Tucker conditions (KKT) for the minimization under constraints.
Practical application
We use the present module to avoid topology violations in an entangled polymer system. To preserve the topology in a system of entangled polymers we need to determine the minimal distance between two bonds. Once done we can apply either a soft or hard core potential to avoid the crossing of two bonds. Here, we propose to determine the minimal distance between two segments with the help of the Karush-Kuhn-Tucker conditions.
This module is a part of an E-CAM pilot project at the ENS Lyon, focused on the implementation of contact joint to resolve excluded volume constraints
This module is used by other ongoing work, such as module velocities_resolve_EV, that resolves the excluded volume constraint with a velocity formulation.
In a recent paper[1], researchers from the Centres of Excellence E-CAM[2] and MaX[3], and the centre for Computational Design and Discovery of Novel Materials NCCR MARVEL[4], have proposed a new procedure for automatically generating Maximally-Localised Wannier functions (MLWFs) for high-throughput frameworks. The methodology and associated software can be used for hitherto difficult cases of entangled bands, and allows the electronic properties of a wide variety of materials to be obtained starting only from the specification of the initial crystal structure, including insulators, semiconductors and metals. Industrial applications that this work will facilitate include the development of novel superconductors, multiferroics, topological insulators, as well as more traditional electronic applications.
Challenge/context
Predicting the properties of complex materials generally entails the use of methods that facilitate coarse grained perspectives more suitable for large scale modelling, and ultimately device design and manufacture. When a quantum level of description of a modular-like system is required, this can often be facilitated by expressing the Hamiltonian in terms of a localised, real-space basis set, enabling it to be partitioned without ambiguity into sub-matrices that correspond to the individual subsystems. Maximally-localised Wannier functions (MLWFs) are particularly suitable in this context. However, until now generating MLWFs has been difficult to exploit in high-throughput design of materials, without the specification by users of a set of initial guesses for the MLWFs, typically trial functions localised in real space, based on their experience and chemical intuition.
Solution
E-CAM[2] scientist Valerio Vitale and co-authors from the partner H2020 Centre of Excellence MAX[3] and the Swiss based NCCR MARVEL [4] in a recent article[1] look afresh at this problem in the context of an algorithm by Damle et al[5], known as the selected columns of the density matrix (SCDM) method, as a method to provide automatically initial guesses for the MLWF search, to compute a set of localized orbitals associated with the Kohn–Sham subspace for insulating systems. This has shown great promise in avoiding the need for user intervention in obtaining MLWFs and is robust, being based on standard linear-algebra routines rather than on iterative minimisation. In particular, Vitale et al. developed a fully-automated protocol based on the SCDM algorithm in which the three remaining free parameters (two from the SCDM method, plus the choice of the target dimensionality for the disentangled subspace) are determined automatically, making it thus parameter-free even in the case of entangled bands. The work systematically compares the accuracy and ease of use of standard methods to generate localised basis sets as (a) MLWFs; (b) MLWFs combined with SCDM’s and (c) using solely SCDM’s; and applies this multifaceted perspective to hundreds of materials including insulators, semiconductors and metals.
Benefit
This is significant because it greatly expands the scope of materials for which MLWFs can be generated in high throughput studies and has the potential to accelerate the design and discovery of materials with tailored properties using first-principles high-throughput (HT) calculations, and facilitate advanced industrial applications. Industrial applications that this work will facilitate include the development of novel superconductors, multiferroics, topological insulators, as well as more traditional electronic applications.
Background information
This module is a collaboration between the E-CAM and MaX HPC centres of excellence, and the NCCR MARVEL.
In SCDM Wannier Functions, E-CAM has implemented the SCDM algorithm in the pw2wannier90 interface code between the Quantum ESPRESSO software and the Wannier90 code. This was done in the context of an E-CAM pilot project at the University of Cambridge. Researchers have then used this implementation as the basis for a complete computational workflow for obtaining MLWFs and electronic properties based on Wannier interpolation of the Brillouin zone, starting only from the specification of the initial crystal structure. The workflow was implemented within the AiiDA materials informatics platform (from the NCCR MARVEL and the MaX CoE) , and used to perform a HT study on a dataset of 200 materials.
Source Code
See the Materials Cloud Archive entry. A downloadable virtual machine is provided that allows to reproduce the results of the associated paper and also to run new calculations for different materials, including all first-principles and atomistic simulations and the computational workflows.
The new CECAM webinar series entitled “The importance of being H.P.C. Earnest”, will focus on of HPC as an enabler of leading-edge simulation and modelling, and on the science made possible by combining state-of-the-art methods with optimal exploitation of supercomputing resources.
A series of 5 CECAM webinars will be held every Thursday 15:00-17:00 (CEST) and broadcasted live on the CECAM YouTube Channel, starting on June 18 2020.
Different experts, who are also key players in projects targeting software development for high-end computational facilities, such as the European Centers of Excellence for Computing Applications and analogous initiatives based in the United States of America, will be present for this occasion.
The E-CAM Centre of Excellence will be featured on Thursday 2 July 2020 by Prof. Ignacio Pagonabarraga, CECAM Director and Technical Manager of E-CAM.
The full programme for the webinar series is the following:
Chapter 1: Thursday, 18 June 2020
Nicola Marzari – EPFL Claudia Filippi – University of Twente Anthony Scemama – University of Toulouse III Giulia Galli – University Of Chicago And Argonne National Laboratory
Chapter 2 : Thursday, 25 June 2020
Erik Lindahl – Stockholm University Jesus Labarta – Barcelona Supercomputing Center Paul Kent – Oak Ridge National Laboratory
Chapter 3 : Thursday, 2 July 2020
Cecilia Clementi – Freie Universität Berlin Ignacio Pagonabarraga – CECAM Peter Coveney – University College London and University of Amsterdam
Chapter 4 : Thursday, 9 July 2020
Edouard Audit – CEA Elisa Molinari – University of Modena Gianluca Palermo – Politecnico di Milano
Chapter 5: Thursday 16 July 2020
Steven G. Louie – University of Berkeley Claudia Draxl – Humboldt University Berlin
HPC facilities are a major capital investment and often run close to capacity. Improving the efficiency of application software running on these facilities either speeds up time to solution or allows for larger, more challenging problems to be solved. The Performance Optimisation and Productivity (POP) Centre of Excellence exists to help academic and industry groups identify how their software can be improved, free of charge. Funded by the EU under the Horizon 2020 Research and Innovation Programme, POP puts the world-class HPC expertise of eight commercial and academic partners at the disposal of European Scientists and Industry.
Collaborations with the POP CoE
Given that POP is home to a large set of performance experts, E-CAM has collaborated with them on (to date) two applications that are of particular interest to E-CAM with respect to extreme scalability: ESPResSo++ and PaPIM. We have also benefitted from their HPC specialists in one of our Extended Software Development Workshops organized by the Electronic Structure Library initiative[1] (ESL), where POP’s experts provided a 1.5 day Tutorial on advanced performance and scalability profiling of the ESL libraries.
Successful collaboration with POP: Optimization of PaPIM
POP carried out a study of PaPIM[2] which resulted in a 10 page report on its performance, highlighting issues in the code and proposing remedies. For example, the report showed that load imbalance issues in the expensive part of the application was mainly related to an uneven spread of the sample groups among the MPI tasks. Of more interest was the communication pattern, where the POP analysis showed that replacing a number of successive collective communications with a single collective of a derived data type could lead to a 4.7 x improvement in communication performance.
How it works
A simple request form should be completed at https://pop-coe.eu/request-service-form. One of their technical experts will be in touch to obtain the details.
Briefly, POP services involve the following steps.
The first step is to profile the application behaviour using suitable parallel profiling tools, e.g. Extrae or Scalasca. This step creates trace files which require analysis by POP experts. This is typically done on the user’s machines. However, if this is not an option for a user, POP can collect performance data on one of their HPC machines. This task can be done either by POP experts or by users with POP support.
The results from the analysis of the trace files are presented to the user, explaining the performance issues with the code and recommendations for performance improvements. Experience shows that it is often difficult to build a quantitative picture of parallel application behaviour. One of the strengths of POP is their set of metrics, which provide a standard, objective way to characterise different aspects of the performance of parallel codes.
POP performance assessment can be followed up by further work, again completely free to the user, to demonstrate how to implement these improvements.
A feature that is particularly useful when dealing with industrial partnerships, is that POP services don’t require access to the source code – they can work with executables. And if needed, non-disclosure agreements can be signed.
[2] PaPIM is a code for computing time-dependent correlation functions and sampling of the phase space. It samples the phase space either classically or quantum mechanically. Documentation available here.
This module is the first in a sequence that will form the overall capabilities of the E-CAM High Throughout Computing (HTC) library. In particular this module deals with creating a set of decorators to wrap around the Dask-Jobqueue Python library, which aspires to make the development time cost of leveraging it lower for our use cases.
The initial motivation for this library is driven by the ensemble-type calculations that are required in many scientific fields, and in particular in the materials science domain in which the E-CAM Centre of Excellence operates.
One specific application for this module is the study of “rare events” in theoretical and computational chemistry, a particularly relevant topic for E-CAM . Many problems in biological chemistry, materials science, and other fields involve events that only spontaneously occur after a millisecond or longer (for example, biomolecular conformational changes, or nucleation processes). That means that around 1012 time steps would be needed to see a single millisecond-scale event.
Modern supercomputers are beginning to make it possible to obtain trajectories long enough to observe some of these processes, but to fully characterize a transition with proper statistics, many examples are needed. In order to obtain many examples the same application must be run many thousands of times with varying inputs. To manage this kind of computation a task scheduling high throughput computing (HTC) library is needed. The main elements of the mentioned scheduling library are: task definition, task scheduling and task execution.
While traditionally an HTC workload is looked down upon in the HPC space, the scientific use case for extreme-scale resources exists and algorithms that require a coordinated approach make efficient libraries that implement this approach increasingly important in the HPC space. The 5 Petaflop booster technology of JURECA is an interesting concept with respect to this approach since the offloading approach of heavy computation marries perfectly to the concept outlined here.
First-principles electronic structure calculations are very widely used thanks to the many successful software packages available. Their traditional coding paradigm is monolithic, i.e., regardless of how modular its internal structure may be, the code is built independently from others, from the compiler up, with the exception of linear-algebra and message-passing libraries. This model has been quite successful for decades. The rapid progress in methodology, however, has resulted in an ever increasing complexity of those programs, which implies a growing amount of replication in coding and in the recurrent re-engineering needed to adapt to evolving hardware architecture. The Electronic Structure Library (ESL) was initiated by CECAM to catalyze a paradigm shift away from the monolithic model and promote modularization, with the ambition to extract common tasks from electronic structure programs and redesign them as free, open-source libraries. They include “heavy-duty” ones with a high degree of parallelisation, and potential for adaptation to novel hardware within them, thereby separating the sophisticated computer science aspects of performance optimization and re-engineering from the computational science done by scientists when implementing new ideas. It is a community effort, undertaken by developers of various successful codes, now facing the challenges arising in the new model. This modular paradigm will improve overall coding efficiency and enable specialists (computer scientists or computational scientists) to use their skills more effectively. It will lead to a more sustainable and dynamic evolution of software as well as lower barriers to entry for new developers.
Particularly active in applying atomistic and coarse-grained simulations to study the interaction of nano-objects and surfactants with lipid bilayers for industrial applications (e.g. soaps, detergents, etc.), Massimo Noro has made considerable contributions to the development and application of the Dissipative Particle Dynamics (DPD) simulation technique to study soft condensed matter systems.
Former science leader of the High Performance Computing division at Unilever and current Director of Business Development at the Science and Technology Facilities Council (STFC), with a focus on the Daresbury Campus (see short bio below). Massimo is also a member of E-CAM’s Executive Board. In this interview, he will talk about his journey from academic research, to work in Unilever and now at STFC, and will share his insights on the use of simulation and modelling in industry and the role of STFC and research in this regard.
Watch Massimo Noro’s reply to three key questions of this interview:
Tell us about your journey from academic research, to work in Unilever and now at STFC
What are the key ingredients for the successful relationship between STFC and Industry
What do you think are the most important HPC needs for industry
Full video interview is available here, with the following outline:
What is the importance of diversity on the work space
Massimo Noro
Massimo Noro is the Director of Business Development at the Science & Technology Facilities Council (STFC), with a focus on the Daresbury Campus. His role is to ensure the continued growth and success of the Daresbury Laboratory at the Sci-Tech Daresbury Campus.
Massimo joined STFC in February 2018, following a successful industrial R&D career at Unilever with a proven track record as program and people leader in a corporate environment – Unilever is a large multinational and a market leader in home care, personal care, refreshments and foods products. He gained considerable experience in managing high-budget projects and in leading teams across sites and across complex organisations. Massimo leads on strategic partnerships with industry and local government; he manages a wide team to deliver innovation, to develop strong pipelines of commercial engagements and to provide a range of offerings for business incubation.
E-CAM has built up a collection of (hopefully) useful information to help our community, other Centres of Excellence, and interested groups, transition to online training. The information originates from community-contributed sources and by directly sharing our experience in capturing and broadcasting E-CAM training events. Guides to help with online training are being rapidly created as the CoVid-19 crises evolves, and we try to keep the information here moderated to avoid overwhelming people.