Integration of ESL modules into electronic-structure codes

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. Electronic structure software complexity is consequently also increasing, requiring a larger effort on code maintenance. Developers of large electronic structure codes are trying to relieve some complexity by transitioning standardized algorithms into separate libraries [BigDFT-PSolver, ELPA, ELSI, LibXC, LibGridXC, etc.]. This paradigm shift requires library developers to have a hybrid developer profile where the scientific and computational skill set becomes equally important. These topics have been extensively and publicly discussed between developers of various projects including ABINIT, ASE, ATK, BigDFT, CASTEP, FHI-aims, GPAW, Octopus, Quantum Espresso, SIESTA, and SPR-KKR.

High-quality standardized libraries are not only a highly challenging effort lying at the hands of the library developers, they also open possibilities for codes to take advantage of a standard way to access commonly used algorithms. Integration of these libraries, however, requires a significant initial effort that is often sacrificed for new developments that often not even reach the mainstream branch of the code. Additionally, there are multiple challenges in adopting new libraries which have their roots in a variety of issues: installation, data structures, physical units and parallelism – all of which are code-dependent. On the other hand, adoption of common libraries ensures the immediate propagation of improvements within the respective library’s field of research and ensures codes are up-to-date with much less effort [LibXC]. Indeed, well-established libraries can have a huge impact on multiple scientific communities at once [PETSc].

In the Electronic Structure community, two issues are emerging. Libraries are being developed [esl, esl-gitlab] but require an ongoing commitment from the community with respect to sharing the maintenance and development effort. Secondly, existing codes will benefit from libraries by adopting their use. Both issues are mainly governed by the exposure of the libraries and the availability of library core developers, which are typically researchers pressured by publication deliverables and fund-raising burdens. They are thus not able to commit a large fraction of their time to software development.

An effort to allow code developers to make use of, and develop, shared components is needed. This requires an efficient coordination between various elements:

– A common and consistent code development infrastructure/education in terms of compilation, installation, testing and documentation.
– How to use and integrate already published libraries into existing projects.
– Creating long-lasting synergies between developers to reach a “critical mass” of component contributors.
– Relevant quality metrics (“TRLs” and “SRLs”), to provide businesses with useful information .

This is what the Electronic Structure Library (ESL)[esl, esl-gitlab] has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organizing workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative [exdci].

 

References

[BigDFT-PSolver] http://bigdft.org/Wiki/index.php?title=The_Solver_Package
[ELPA] https://gitlab.mpcdf.mgp.de/elpa/elpa
[ELSI] http://elsi-interchange.org
[LibXC] http://www.tddft.org/programs/libxc/
[LibGridXC] https://launchpad.net/libgridxc
[PETSc] https://www.mcs.anl.gov/petsc/
[esl] http://esl.cecam.org/
[esl-gitlab] http://gitlab.e-cam2020.eu/esl
[exdci] https://exdci.eu/newsroom/press-releases/exdci-towards-common-hpc-strategy-europe

Share

Extended Software Development Workshop: Mesoscopic simulation models and High-Performance Computing

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

In Discrete Element Methods the equation of motion of large number of particles is numerically integrated to obtain the trajectory of each particle [1]. The collective movement of the particles very often provides the system with unpredictable complex dynamics inaccessible via any mean field approach. Such phenomenology is present for instance in a seemingly simple systems such as the hopper/silo, where intermittent flow accompanied with random clogging occurs [2]. With the development of computing power alongside that of the numerical algorithms it has become possible to simulate such scenarios involving the trajectories of millions of spherical particles for a limited simulation time. Incorporating more complex particle shapes [3] or the influence of the interstitial medium [4] rapidly decrease the accessible range of the number of particles.

Another class of computer simulations having a huge popularity among the science and engineering community is the Computational Fluid Dynamics (CFD). A tractable method for performing such simulations is the family of Lattice Boltzmann Methods (LBMs) [5]. There, instead of directly solving the strongly non-linear Navier-Stokes equations, the discrete Boltzmann equation is solved to simulate the flow of Newtonian or non-Newtonian fluids with the appropriate collision models [6,7]. The method resembles a lot the DEMs as it simulates the the streaming and collision processes across a limited number of intrinsic particles, which evince viscous flow applicable across the greater mass.

As both of the methods have gained popularity in solving engineering problems, and scientists have become more aware of finite size effects, the size and time requirements to simulate practically relevant systems using these methods have escaped beyond the capabilities of even the most modern CPUs [8,9]. Massive parallelization is thus becoming a necessity. This is naturally offered by graphics processing units (GPUs) making them an attractive alternative for running these simulations, which consist of a large number of relatively simple mathematical operations readily implemented in a GPU [8,9].

 

References

[1] P.A. Cundall and O.D.L. Strack, Geotechnique 29, 47–65 (1979).
[2] H. G. Sheldon and D. J. Durian, Granular Matter 6, 579-585 (2010).
[3] A. Khazeni, Z. Mansourpour Powder Tech. 332, 265-278 (2018).
[4] J. Koivisto, M. Korhonen, M. J. Alava, C. P. Ortiz, D. J. Durian, A. Puisto, Soft Matter 13 7657-7664 (2017).
[5] S. Succi,The lattice Boltzmann equation: for fluid dynamics and beyond. Oxford university press, (2001).
[6] L. S. Luo, W. Liao, X. Chen, Y. Peng, W. Zhang, Phys. Rev. E, 83, 056710 (2011).
[7] S. Gabbanelli, G.Drazer, J. Koplik, Phys. Rev. E, 72, 046312 (2005).
[8] N Govender, R. K. Rajamani, S. Kok, D. N. Wilke, Minerals Engin. 79, 152-168 (2015).
[9] P.R. Rinaldi, E. A. Dari, M. J. Vénere, A. Clausse, Simulation Modelling Practice and Theory, 25, 163-171 (2012).

Share

Inverse Molecular Design & Inference: building a Molecular Foundry

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The overarching theme of this proposed E-CAM Transverse Extended Software Development Workshop is the design and control of molecular machines including sensors, enzymes, therapeutics, and transporters built as fusion proteins or nanocarrier-protein complexes, and in particular, the software development and interfacing that this entails. Several immuno-diagnostic companies and molecular biology experimental groups have expressed a strong interest in the projects at the core of this proposal. The proposed ESDW is transverse as it entails the use of methodologies from two E-CAM Scientific Workpackages: WP1 (Advanced MD/rare-events methods) and WP4 (Mesoscale/Multiscale simulation).

Fusion proteins are sets of two or more protein modules linked together where the underlying genetic codes of each module and the fusion protein itself are known or can be easily inferred. The fusion protein typically retains the functions of its components, and in some cases gains additional functions. They occur in nature, but also can be made artificially using genetic engineering and biotechnology- and used for a wide variety of settings ranging from unimolecular FRET sensors, novel immuno-based cancer drugs, enzymes [1,2] and energy conversion (for example efficient generation of alcohol from cellulose) [3,4]. Fusion proteins can be expressed using genetic engineering in cell lines, and purified for in-vitro use using biotechnology. Much of the design work is focused on how different modules are optimally linked or fused together via suitable peptides, rather than on internal changes of modules. Optimizing such designs experimentally can be done through for example random mutations, but a more controlled approach based on underlying molecular mechanisms is desirable, for which a pragmatic multiscale approach is ideally suited combining bioinformatics and homology, coarse-graining, detailed MD and rare-event based methods, and machine learning. The figure on the front of this proposal is a representative example of a fusion protein sensor designed to bind to a specific RNA nucleic acid sub-sequence, which causes an optimized hinge-like protein to close and in the process bring two fluorescence proteins together allowing the binding event to be observed optically through FRET microscopy.

Nanocarriers (NC) are promising tools for cancer immunotherapy and other diagnostic and therapeutic applications. NCs can be decorated on their surface with molecules that facilitate target-specific antigen delivery to certain antigen-presenting cell types or tumor cells. However, the target cell-specific uptake of nano-vaccines is highly dependent on the modifications of the NC itself. One of these is the formation of a protein corona [5] around NC after in vivo administration. Appropriate targeting of NC can be affected by unintended interactions of the NC surface with components of blood plasma and/or with cell surface structures that are unrelated to the specific targeting structure. The protein corona around NC may affect their organ-specific or cell type-specific trafficking as well as endocytosis and/or functional properties of the NC. Most importantly, the protein corona has been shown to interfere with targeting moieties used to induce receptor-mediated uptake of the NC, both inhibiting and enhancing internalization by specific cell types [5]. Moreover, the protein corona is taken up by the target cell, which may alter their function. Therefore, tailoring the surface properties of the NC to facilitate the adsorption of specific proteins and control the structure of the corona can help to significantly improve their performance. Modification of surface properties, e.g. via grafting olygomers, is also known to affect the preferred orientation of adsorbed proteins and, therefore, their functionality [6]. The molecular design would include the selection of appropriate NC coating and the type of antibody to optimize the NC uptake.

Mesoscale simulation is required to understand the thermodynamics and kinetics of protein adsorption on the NCs with engineered surfaces [7] and to achieve the desired structure with preferred adsorption of the selected antigen. However, the aforementioned issues often require biological and chemical accuracy that typical mesoscale models cannot achieve unless buttressed by accurate simulations at an atomistic/molecular level, rare-event methods and machine learning.

A pragmatic approach towards the enhancement of fusion proteins and NC’s is as follows.

(i) Molecular designs are initially developed and optimized as simple CG models and include the use of information theory and machine learning.

(ii) The solution of the inverse problem of building the fusion protein or the NC-protein complex to match the design requires a multiscale approach combining mesoscale modeling, molecular dynamics, rare-event methods, machine learning, homology, mutation, solvent conditions.

(iii) Iterate steps (i) and (ii) to optimize the design, and in the process collect data for machine learning driven design.

(iv) Final validation using detailed MD, rare-event methods and HPC

The ESDW we plan will over the course of two 5 day meetings with several intervening months produce multiple software modules including the following.
(a) C/C++/Modern Fortran or python based codes to build and optimize simple CG models of fusion proteins or NC-protein complexes using information theory and machine learning.

(b) Semi-automated pipelines to solve the inverse problem of building the fusion protein or the NC to match the design. This will involve interfacing with md/ mesoscale engines such as LAMMPS, Gromacs, OPENMM, EXpresso, rare-event based methods such as PLUMED, and bioinformatics code such as I-TASSER, INTFOLD.

(c) Particle insertion/deletion methods for alchemistry – mutation of amino acids, changes in the solvent and associated changes in free energy properties.

(d) Codes to add corrections to coarse-grained models (bead models/martini) using detailed atomistic data (e.g. potential of mean force for key order parameters, structure factors etc) or experimental data where available.

While this is an ambitious plan, it is worth pointing out that a similar integrated approach to protein development was already made by the lab of John Chodera [8]. While it did not include the focus on fusion proteins or NC-protein complexes or incorporate systematically coarse-graining, it demonstrates both the feasibility of what we propose here and how to achieve practical solutions. Other ideas of a systematic approach to molecular design using MD simulation have been also proposed recently [9,10].

 

References

[1] H. Yang et al, The promises and challenges of fusion constructs in protein biochemistry and enzymology, Appl Microbiol Biotechnol (2016)
[2] Bochicchio, Anna et al, Designing the Sniper: Improving Targeted Human Cytolytic Fusion Proteins for Anti-Cancer Therapy via Molecular Simulation, Biomedicines, 5(1),9 (2017)
[3] Y. Fujita et al, Direct and Efficient Production of Ethanol from Cellulosic Material with a Yeast Strain Displaying Cellulolytic Enzymes, Appl Environ Microbiol. 68(10): 5136–5141 (2002)
[4] M. Gunnoo et al, Nanoscale Engineering of Designer Cellulosomes, dv Mater. 28(27):5619-4 (2016)
[5] M. Bros et al. The Protein Corona as a Confounding Variable of Nanoparticle-Mediated Targeted Vaccine Delivery, Front. Immunol. 9, 1760 (2018).
[6] I. Lieberwirth et al. The Role of the Protein Corona in the Uptake Process of Nanoparticles, 24, Supplement S1, Proceedings of Microscopy & Microanalysis (2018)
[7] H Lopez et al. Multiscale Modelling of Bionano Interface, Adv. Exp. Med. Biol. 947, 173-206 (2017)
[8] DL. Parton et al Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale. PLoS Comput Biol 12(6): e1004728, (2016)
[9] PV. Komarov et al. A new concept for molecular engineering of artificial enzymes: a multiscale simulation, Soft Matter 12, 689-704 (2016)
[10] BA. Thurston et al. Machine learning and molecular design of self-assembling -conjugated oligopeptides, Mol. Sim. 44, 930-945 (2018)
[11] D. Carroll. Genome Engineering with Targetable Nucleases, Annu. Rev. Biochem. 83:409–39 (2014)

Share

Extended software development workshop in quantum dynamics

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Quantum molecular dynamics simulations are pivotal to understanding and predicting the microscopic details of molecules, and strongly rely on a combined theoretical and computational effort. When considering molecular systems, the complexity of the underlying equations is such that approximations have to be devised, and the resulting theories need to be translated into algorithms and computer programs for numerical simulations. In the last decades, the joint effort of theoretical physicists and quantum chemists around the challenges of quantum dynamics made it possible to investigate the quantum dynamics of complex molecular systems, with applications ranging from energy conversion, energy storage, organic electronics, light-emitting devices, biofluorescent molecules, or photocatalysis, to name a few.
Two different strategies have been successfully applied to perform quantum molecular dynamics: wavepacket propagation or trajectories. The first family of methods includes all quantum nuclear effects, but their computational cost hampers the simulation of systems with moderate number of more than 10-12 degrees of freedom. The method coined multi-configuration time-dependent Hartree (MCTDH) constitutes one of the most successful developments in this field and is often considered as a gold standard for quantum dynamics [1]. Other strategies for wavepacket propagation try to identify procedures to optimize the “space” where the wavefunction information is computed, such that Cartesian grids can be replaced with Smolyak grids [2]. The second family of methods introduces the idea of trajectories as a way to approximate the nuclear subsystem, either classically or semiclassically, and is exemplified by methods like the trajectory surface hopping and Ehrenfest schemes [3], or the more accurate methods coupled-trajectory mixed quantum-classical (CT-MQC) [4] and quantum-classical Liouville equation (QCLE) [5].
From a computational perspective, both families of methods require extensive electronic structure calculations, as the nuclei move under the effect of the electronic subsystem, either “statically” occupying its ground state or “dynamically” switching between excited states. Solving the quantum nuclear dynamics equations also becomes in itself very expensive in the case of wavepacket propagation methods. Contrary to other, more consolidated, areas of modeling, quantum dynamics simulations do not benefit from established community packages and most of the progress occurs based on in-house codes, difficult to maintain and with limits in optimization and portability. One of the core actions of E-CAM has been to seed a change in this situation, by promoting systematic developments of software, providing a repository to host and share code, and fostering collaborations on adding functionalities and improving the performance of common software scaffolds for wavepacket (Quantics) and trajectory-based (PaPIM) packages. Collaborations on developments on other codes have also been initiated. This workshop aims at continuing and extending these activities based on input from the community.

 

References

[1] H. D. Meyer, U. Manthe, L. S. Cederbaum. Chem. Phys. Lett. 165 (1990) 73.
[2] D. Lauvergant, A. Nauts. Spectrochimica Acta Part A 119 (2014) 18.
[3] J. C. Tully. Faraday Discuss. 110 (1998) 407.
[4] S. K. Min, F. Agostini, I. Tavernelli, E. K. U. Gross. J. Phys. Chem. Lett. 8 (2017) 3048.
[5] R. Kapral. Annu. Rev. Phys. Chem. 57 (2006) 129.

Share

ESDW: Topics in Classical MD

[button url=”https://www.e-cam2020.eu/calendar/” target=”_self” color=”primary”]Back to Calendar[/button]

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Classical molecular dynamics (MD) is a broad field, with many domains of expertise. Those specialist domains include topics like transition path sampling (which harvests many examples of a process in order to study it at a statistical level [1]), metadynamics (which runs a trajectory with modified dynamics that enhance sampling, and from which free energy profiles can be constructed [2]), as well as various topics focused on the underlying dynamics, either by providing better representations of the interactions between atoms (e.g., force fields [3] or neural network potentials [4]) or by changing the way the dynamics are performed (e.g., integrators [5]).

Frequently, experts in one domain are not experienced with the software of other domains. This workshop aims to combine both depth, by extending domain-specific software, and breadth, by providing participants an opportunity to learn about software from other domains. As an extended software development workshop (ESDW), a key component of the workshop will be the development of modules that extend existing software packages. Ideally, some modules may connect multiple domain-specific packages.

Topics at this workshop will include using and extending modern MD software in the domains of:

* advanced path sampling methods (and the software package OpenPathSampling)
* metadynamics and the calculation of collective variables (and the software package PLUMED)
* machine learning for molecular dynamics simulatons (including local structure recognition and representation of potential energy surfaces)

In addition, this workshop will feature an emphasis on performance testing and benchmarking software, with particular focus on high performance computing. This subject is relevant to all specialist domains.

By combining introductions to software from different specialist fields with an opportunity to extend domain-specific software, this workshop is intended to provide opportunities for cross-pollination between domains that often develop independently.

References

[1] Bolhuis, P.G. and Dellago, C. Trajectory‐Based Rare Event Simulations. Reviews in Computational Chemistry, 27, p. 111 (2010).
[2] A. Laio and F.L. Gervasio. Rep. Prog. Phys. 71, 126601 (2008).
[3] J.A. Maier, C. Martinez, K. Kasavajhala, L. Wickstrom, K.E. Hauser, and C. Simmerling. J. Chem. Theory. Comput. 11, 3696 (2015).
[4] T. Morawietz, A. Singraber, C. Dellago, and J. Behler. Proc. Natl. Acad. Sci USA, 113, 8368 (2016).
[5] B. Leimkuhler and C. Matthews. Appl. Math. Res. Express, 2013, 34 (2013).

Share

Topics in Classical MD – Extended Software Development Workshop

E-CAM is organising an Extended Software Development Workshop in Topics in Classical MD from 3 to 12 April 2019, which is a major coding initiative that will combine lectures; coding sessions and hands-on training.

Topics at this workshop will include using and extending modern MD software in the domains of:

  • advanced path sampling methods (and the software package OpenPathSampling)
  • metadynamics and the calculation of collective variables (and the software package PLUMED)
  • machine learning for molecular dynamics simulatons (including local structure recognition and representation of potential energy surfaces).

In addition, this workshop will feature an emphasis on performance testing and benchmarking software, with particular focus on high performance computing.

This is a great opportunity to bring your software development project in all specialist domains of Classical MD and spend two weeks in the beautiful city of Lyon with other peers and with experienced coders. More information and apply through the CECAM website at: https://www.cecam.org/workshop-1802.html.

Share

Extended Software Development Workshop: Scaling Electronic Structure Applications

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. The large feature-full codes that were once developed within one field are now undergoing a heavy restructuring to reach much broader communities, including companies and non-scientific users[1]. More and more use cases and workflows are performed by highly-automated frameworks instead of humans: high-throughput calculations and computational materials design[2], large data repositories[3], and multiscale/multi-paradigm modeling[4], for instance. At the same time, High-Performance Computing Centers are paving the way to exascale, with a cascade of effects on how to operate, from computer architectures[5] to application design[6]. The disruptive paradigm of quantum computing is also putting a big question mark on the relevance of all the ongoing efforts[7].

All these trends are highly challenging for the electronic structure community. Computer architectures have become rapidly moving targets, forcing a global paradigm shift[8]. As a result, long-ignored and well-established software good practices that were summarised in the Agile Manifesto[9] nearly 20 years ago are now adopted at an accelerating pace by more and more software projects[10]. With time, this kind of migration is becoming a question of survival, the key for a successful transformation being to allow and preserve an enhanced collaboration between the increasing number of disciplines involved. Significant efforts of integration from code developers are also necessary, since both hardware and software paradigms have to change at once[11].

Two major issues are also coming from the community itself. Hybrid developer profiles, with people fluent both in computational and scientific matters, are still difficult to find and retain. On the long run, the numerous ongoing training initiatives will gradually improve the situation, while on the short run, the issue is becoming more salient and painful, because the context evolves faster than ever. Good practices have usually been the first element sacrificed in the “publish or perish” race. New features have usually been bound to the duration of a post-doc contract and been left undocumented and poorly tested, favoring the unsustainable “reinventing the wheel” syndrome.

Addressing these issues requires coordinated efforts at multiple levels:
– from a methodological perspective, mainly through the creation of open standards and the use of co-design, both for programming and for data[12];
– regarding documentation, with a significant leap in content policies, helped by tools like Doxygen and Sphinx, as well as publication platforms like ReadTheDocs[13];
– for testing, by introducing test-driven development concepts and systematically publishing test suites together with software[14];
– considering deployment, by creating synergies with popular software distribution systems[15];
– socially, by disseminating the relevant knowledge and training the community, through the release of demonstrators and giving all stakeholders the opportunity to meet regularly[16].

This is what the Electronic Structure Library (ESL)[17] has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organising workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative (EXDCI)[18].

Since 2014, the Electronic Structure Library has been steadily growing and developing to cover most fundamental tasks required by electronic structure codes. In February 2018 an extended software development workshop will be held at CECAM-HQ with the purpose of building demonstrator codes providing powerful, non-trivial examples of how the ESL libraries can be used. These demonstrators will also provide a platform to test the performance and usability of the libraries in an environment as close as possible to real-life situations. This marks a milestone and enables the next step in the ESL development: going from a collection of libraries with a clear set of features and stable interfaces to a bundle of highly efficient, scalable and integrated implementations of those libraries.

Many libraries developed within the ESL perform low-level tasks or very specific steps of more complex algorithms and are not capable, by themselves, to reach exascale performances. Nevertheless, if they are to be used as efficient components of exascale codes, they must provide some level of parallelism and be as efficient as possible in a wide variety of architectures. During this workshop, we propose to perform advanced performance and scalability profiling of the ESL libraries. With that knowledge in hand it will be possible to select and implement the best strategies for parallelizing and optimizing the libraries. Assistance from HPC experts will be essential and is an unique opportunity to foster collaborations with other Centres of Excellence, like PoP (https://pop-coe.eu/) and MaX (http://www.max-centre.eu/).

Based on the successful experience of the previous ESL workshops, we propose to divide the workshop in two parts. The first two days will be dedicated to initial discussions between the participants and other invited stakeholders, and to presentations on state-of-the art methodological and software developments, performance analysis and scalability of applications. The remainder of the workshop will consist in a 12 days coding effort by a smaller team of experienced developers. Both the discussion and software development will take advantage of the ESL infrastructure (wiki, gitlab, etc) that was set up during the previous ESL workshops.

[1] See http://www.nanogune.eu/es/projects/spanish-initiative-electronic-simulations-thousands-atoms-codigo-abierto-con-garantia-y and
[2] See http://pymatgen.org/ and http://www.aiida.net/ for example.
[3] http://nomad-repository.eu/
[4] https://abidev2017.abinit.org/images/talks/abidev2017_Ghosez.pdf
[5] http://www.deep-project.eu/
[6] https://code.grnet.gr/projects/prace-npt/wiki/StarSs
[7] https://www.newscientist.com/article/2138373-google-on-track-for-quantum-computer-breakthrough-by-end-of-2017/
[8] https://arxiv.org/pdf/1405.4464.pdf (sustainable software engineering)
[9] http://agilemanifesto.org/
[10] Several long-running projects routinely use modern bug trackers and continuous integration, e.g.: http://gitlab.abinit.org/, https://gitlab.com/octopus-code/octopus, http://qe-forge.org/, https://launchpad.net/siesta
[11] Transition of HPC Towards Exascale Computing, Volume 24 of Advances in Parallel Computing, E.H. D’Hollander, IOS Press, 2013, ISBN: 9781614993247
[12] See https://en.wikipedia.org/wiki/Open_standard and https://en.wikipedia.org/wiki/Participatory_design
[13] See http://www.doxygen.org/, http://www.sphinx-doc.org/, and http://readthedocs.org/
[14] See https://en.wikipedia.org/wiki/Test-driven_development and http://agiledata.org/essays/tdd.html
[15] See e.g. http://www.etp4hpc.eu/en/esds.html
[16] See e.g. https://easybuilders.github.io/easybuild/, https://github.com/LLNL/spack, https://github.com/snapcore/snapcraft, and https://www.macports.org/ports.php?by=category&substr=science
[17] http://esl.cecam.org/
[18] https://exdci.eu/newsroom/press-releases/exdci-towards-common-hpc-strategy-europe

Share

Scientific reports from the 2018 E-CAM workshops are now available on our website

 

The scientific reports* from the following workshops conducted in year 3 of the project E-CAM (2018):

  1. E-CAM Scoping Workshop: “Solubility prediction”, 14 – 15 May 2018, Ecole Normale Supérieure de Lyon, France,
  2. E-CAM Scoping Workshop: “Dissipative particle dynamics: Where do we stand on predictive application?”, 24 – 26 April 2018, Daresbury Laboratory, United Kingdom,
  3. E-CAM Extended Software Development Workshop 11: “Quantum Dynamics”, 18 – 29 June 2018, Maison de la Simulation, France,

are now available for download on our website at this location. Furthermore, they will also be integrated in the CECAM Report of Activities for 2018, published every year on the website www.cecam.org.

 

*© CECAM 2018, all rights reserved.

Please address any comments or questions to info@e-cam2020.eu.

Share

Extended Software Development Workshop: Atomistic, Meso- and Multiscale Methods on HPC Systems

If you are interested in attending this event, please visit the CECAM website here. This a multi-part event and we indicate the date for the first meeting. Dates of follow ups are decided during the first event.

Workshop Description

E-CAM is an EINFRA project funded by H2020. Its goal is to create, develop, and sustain a European infrastructure for computational science, applied to simulation and modelling of materials and biological processes that are of industrial and societal interest. E-CAM builds upon the considerable European expertise and capability in this area.

E-CAM is organized around four scientific areas: Molecular dynamics, electronic structure, quantum dynamics and meso- and multiscale modelling, corresponding to work packages WP1-4. E-CAM gathers a number of groups with complementary expertise in the area of meso- and multiscale modeling and has also very well established contact to simulation code developers. Among the aims of the involved groups in this area is to produce a software stack by combining software modules, and to further develop existing simulation codes towards highly scalable applications on high performance computer architectures. It has been identified as a key issue that simulation codes in the field of molecular dynamics, meso- and multiscale applications should be prepared for the upcoming HPC architectures. Different approaches have been proposed by E-CAM WPs: (i) developing and optimizing highly scalable applications, running a single application on a large number of cores and (ii) developing micro-schedulers for task-farming approaches, where multiple simulations run each on smaller partitions of a large HPC system and work together on the collection of statistics or the sampling of a parameter space, for which only loosely coupled simulations would be needed. Both approaches rely on the efficient implementation of simulation codes.

Concerning strategy, most of modern parallelized (classical) particle simulation programs are based on a spatial decomposition method as an underlying parallel algorithm. In this case, different processors administrate different spatial regions of the simulation domain and keep track of those particles that are located in their respective region. Processors exchange information (i) in order to compute interactions between particles located on different processors, and (ii) to exchange particles that have moved to a region administrated by a different processor. This implies that the workload of a given processor is very much determined by its number of particles, or, more precisely, by the number of interactions that are to be evaluated within its spatial region.

Certain systems of high physical and practical interest (e.g. condensing fluids) dynamically develop into a state where the distribution of particles becomes spatially inhomogeneous. Unless special care is being taken, this results in a substantially inhomogeneous distribution of the processors’ workload. Since the work usually has to be synchronized between the processors, the runtime is determined by the slowest processor (i.e. the one with highest workload). In the extreme case, this means that a large fraction of the processors is idle during these waiting times. This problem becomes particularly severe if one aims at strong scaling, where the number of processors is increased at constant problem size: Every processor administrates smaller and smaller regions and therefore inhomogeneities will become more and more pronounced. This will eventually saturate the scalability of a given problem, already at a processor number that is still so small that communication overhead remains negligible.

The solution to this problem is the inclusion of dynamic load balancing techniques. These methods redistribute the workload among the processors, by lowering the load of the most busy cores and enhancing the load of the most idle ones. Fortunately, several successful techniques are known already to put this strategy into practice (see references). Nevertheless, dynamic load balancing that is both efficient and widely applicable implies highly non-trivial coding work. Therefore it has has not yet been implemented in a number of important codes of the E-CAM community, e.g. DL_Meso, DL_Poly, Espresso, Espresso++, to name a few. Other codes (e.g. LAMMPS) have implemented somewhat simpler schemes, which however might turn out to lack sufficient flexibility to accommodate all important cases. Therefore, the present proposal suggests to organize an Extended Software Development Workshop (ESDW) within E-CAM, where code developers of CECAM community codes are invited together with E-CAM postdocs, to work on the implementation of load balancing strategies. The goal of this activity is to increase the scalability of these applications to a larger number of cores on HPC systems, for spatially inhomogeneous systems, and thus to reduce the time-to-solution of the applications.

The workshop is intended to make a major community effort in the direction of improving European simulation codes in the field of classical atomistic, mesoscopic and multiscale simulation. Various load balancing techniques will be presented, discussed and selectively implemented into codes. Sample implementations of load balancing techniques have been done for the codes IMD and MP2C. These are highly scalable particle codes, cf. e.g. http://www.fz-juelich.de/ias/jsc/EN/Expertise/High-Q-Club/_node.html. The technical task is to provide a domain decomposition with flexible adjustment of domain boarders. The basic load balancing functionality will be implemented and provided by a library, which will be accessed via interfaces from the codes.

In order to attract both developers of the codes as well as postdocs working within E-CAM the workshop will be split into 3 parts:

Part 1: preparation meeting (2 days)
– various types of load balancing schemes will be presented conceptually and examples of implemented techniques will be shown
– code developers / owners will present their codes. Functionalities will be presented and parallel implementations are discussed in view of technical requirements for the implementation of load balancing techniques
– an interface definition for exchanging information from a simulation code to a load balancing library will be set up

Part 2: training and implementation (1 week)
– to enable E-CAM postdocs to actively participate in the development, some advanced technical courses on MPI and high-performance C++ will be offered in combination with the PRACE PATC course program at Juelich
– during and after the courses (planned for 2-3 days), participants can start implementing a load balancing scheme into a code
– for those participants who are already on an expert level in HPC techniques, it is possible to start immediately with implementing load balancing schemes

Part 3: implementation and benchmarking (1 week)
– final implementation work with the goal to have at least one working implementation per code
– for successful implementations benchmarks are conducted on Juelich supercomputer facilities

The second part will also be open for a broader community from E-CAM, so that the workshop can have an impact on the HPC training of postdocs in E-CAM, which will strengthen their skills and experience in HPC.

It is intended that between the face-to-face parts of the workshop, postdocs and developers continue the preparation and work on the load balancing schemes, so that the meetings will be an important step to synchronise, exchange information and experience and improve the current versions of implementation.

Share

Extended Software Development Workshop: Intelligent high throughput computing for scientific applications

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

High throughput computing (HTC) is a computing paradigm focused on the execution of many loosely coupled tasks. It is a useful and general approach to parallelizing (nearly) embarrassingly parallel problems. Distributed computing middleware, such as Celery [1] or COMP Superscalar (COMPSs) [2], can include tools to facilitate HTC, although there may be challenges extending such approaches to the exascale.

Across scientific fields, HTC is becoming a necessary approach in order to fully utilize next-generation computer hardware. As an example, consider molecular dynamics: Excellent work over the years has developed software that can simulate a single trajectory very efficiently using massive parallelization [3]. Unfortunately, for a fixed number of atoms, the extent of possible parallelization is limited. However, many methods, including semiclassical approaches to quantum dynamics [4,5] and some approaches to rare events [6,7], require running thousands of independent molecular dynamics trajectories. Intelligent HTC, which can treat each trajectory as a task and manage data dependencies between tasks, provides a way to run these simulations on hardware up to the exascale, thus opening the possibility of studying previously intractable systems.

In practice, many scientific programmers are not aware of the range of middleware to facilitate parallel programming. When HTC-like approaches are implemented as part of a scientific software project, they are often done manually, or through custom scripts to manage SSH, or by running separate jobs and manually collating the results. Using the intelligent high-level approaches enabled by distributed computing middleware will simplify and speed up development.

Furthermore, middleware frameworks can meet the needs of many different computing infrastructures. For example, in addition to working within a single job on a cluster, COMPSs includes support for working through a cluster’s queueing system or working on a distributed grid. Moreover, architecting a software package such that it can take advantage of one HTC library will make it easy to use other HTC middleware. Having all of these possibilities immediately available will enable developers to quickly create software that can meet the needs of many users.

This E-CAM Extended Software Development Workshop (ESDW) will focus on intelligent HTC as a technique that crosses many domains within the molecular simulation community in general and the E-CAM community in particular. Teaching developers how to incorporate middleware for HTC matches E-CAM’s goal of training scientific developers on the use of more sophisticated software development tools and techniques.

This E-CAM extended software development workshop (ESDW) will focus on intelligent HTC, with the primary goals being:

1. To help scientific developers interface their software with HTC middleware.
2. To benchmark, and ideally improve, the performance of HTC middleware as applications approach extreme scale.

This workshop will aim to produce four or more software modules related to intelligent HTC, and to submit them, with their documentation, to the E-CAM software module repository. These will include modules adding HTC support to existing computational chemistry codes, where the participants will bring the codes they are developing. They may also include modules adding new middleware or adding features to existing middleware that facilitate the use of HTC by the computational chemistry community. This workshop will involve training both in the general topic of designing software to interface with HTC libraries, and in the details of interfacing with specific middleware packages.

The range of use for intelligent HTC in scientific programs is broad. For example, intelligent HTC can be used to select and run many single-point electronic structure calculations in order to develop approximate potential energy surfaces. Even more examples can be found in the wide range of methods that require many trajectories, where each trajectory can be treated as a task, such as:

* rare events methods, like transition interface sampling, weighted ensemble, committor analysis, and variants of the Bennett-Chandler reactive flux method
* semiclassical methods, including the phase integration method and the semiclassical initial value representation
* adaptive sampling methods for Markov state model generation
* approaches such as nested sampling, which use many short trajectories to estimate partition functions

The challenge is that most developers of scientific software are not familiar with the way such packages can simplify their development process, and the packages that exist may not scale to exascale. This workshop will introduce scientific software developers to useful middleware packages, improve scaling, and provide an opportunity for scientific developers to add support for HTC to their codes.

Major topics that will be covered include:

* Concepts of HTC; how to structure code for HTC
* Accessing computational resources to use HTC
* Interfacing existing C/C++/Fortran code with Python libraries
* Specifics of interfacing with Celery/COMPSs
* Challenges in using existing middleware at extreme scale

[1] Celery: Distributed Task Queue. http://celeryproject.org, date accessed 14 August 2017.

[2] R.M. Badia et al. SoftwareX 3-4, 32 (2015).

[3] S. Plimpton. J. Comput. Phys. 117, 1 (1995).

[4] W.H. Miller. J. Chem. Phys. 105, 2942 (2001).

[5] J. Beutier et al. J. Chem. Phys. 141, 084102 (2014).

[6] Du et al. J. Chem. Phys. 108, 334 (1998).

[7] G.A. Huber and S. Kim. Biophys. J. 70, 97 (1996).

Share