Extended Software Development Workshop: Intelligent high throughput computing for scientific applications

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

High throughput computing (HTC) is a computing paradigm focused on the execution of many loosely coupled tasks. It is a useful and general approach to parallelizing (nearly) embarrassingly parallel problems. Distributed computing middleware, such as Celery [1] or COMP Superscalar (COMPSs) [2], can include tools to facilitate HTC, although there may be challenges extending such approaches to the exascale.

Across scientific fields, HTC is becoming a necessary approach in order to fully utilize next-generation computer hardware. As an example, consider molecular dynamics: Excellent work over the years has developed software that can simulate a single trajectory very efficiently using massive parallelization [3]. Unfortunately, for a fixed number of atoms, the extent of possible parallelization is limited. However, many methods, including semiclassical approaches to quantum dynamics [4,5] and some approaches to rare events [6,7], require running thousands of independent molecular dynamics trajectories. Intelligent HTC, which can treat each trajectory as a task and manage data dependencies between tasks, provides a way to run these simulations on hardware up to the exascale, thus opening the possibility of studying previously intractable systems.

In practice, many scientific programmers are not aware of the range of middleware to facilitate parallel programming. When HTC-like approaches are implemented as part of a scientific software project, they are often done manually, or through custom scripts to manage SSH, or by running separate jobs and manually collating the results. Using the intelligent high-level approaches enabled by distributed computing middleware will simplify and speed up development.

Furthermore, middleware frameworks can meet the needs of many different computing infrastructures. For example, in addition to working within a single job on a cluster, COMPSs includes support for working through a cluster’s queueing system or working on a distributed grid. Moreover, architecting a software package such that it can take advantage of one HTC library will make it easy to use other HTC middleware. Having all of these possibilities immediately available will enable developers to quickly create software that can meet the needs of many users.

This E-CAM Extended Software Development Workshop (ESDW) will focus on intelligent HTC as a technique that crosses many domains within the molecular simulation community in general and the E-CAM community in particular. Teaching developers how to incorporate middleware for HTC matches E-CAM’s goal of training scientific developers on the use of more sophisticated software development tools and techniques.

This E-CAM extended software development workshop (ESDW) will focus on intelligent HTC, with the primary goals being:

1. To help scientific developers interface their software with HTC middleware.
2. To benchmark, and ideally improve, the performance of HTC middleware as applications approach extreme scale.

This workshop will aim to produce four or more software modules related to intelligent HTC, and to submit them, with their documentation, to the E-CAM software module repository. These will include modules adding HTC support to existing computational chemistry codes, where the participants will bring the codes they are developing. They may also include modules adding new middleware or adding features to existing middleware that facilitate the use of HTC by the computational chemistry community. This workshop will involve training both in the general topic of designing software to interface with HTC libraries, and in the details of interfacing with specific middleware packages.

The range of use for intelligent HTC in scientific programs is broad. For example, intelligent HTC can be used to select and run many single-point electronic structure calculations in order to develop approximate potential energy surfaces. Even more examples can be found in the wide range of methods that require many trajectories, where each trajectory can be treated as a task, such as:

* rare events methods, like transition interface sampling, weighted ensemble, committor analysis, and variants of the Bennett-Chandler reactive flux method
* semiclassical methods, including the phase integration method and the semiclassical initial value representation
* adaptive sampling methods for Markov state model generation
* approaches such as nested sampling, which use many short trajectories to estimate partition functions

The challenge is that most developers of scientific software are not familiar with the way such packages can simplify their development process, and the packages that exist may not scale to exascale. This workshop will introduce scientific software developers to useful middleware packages, improve scaling, and provide an opportunity for scientific developers to add support for HTC to their codes.

Major topics that will be covered include:

* Concepts of HTC; how to structure code for HTC
* Accessing computational resources to use HTC
* Interfacing existing C/C++/Fortran code with Python libraries
* Specifics of interfacing with Celery/COMPSs
* Challenges in using existing middleware at extreme scale

[1] Celery: Distributed Task Queue. http://celeryproject.org, date accessed 14 August 2017.

[2] R.M. Badia et al. SoftwareX 3-4, 32 (2015).

[3] S. Plimpton. J. Comput. Phys. 117, 1 (1995).

[4] W.H. Miller. J. Chem. Phys. 105, 2942 (2001).

[5] J. Beutier et al. J. Chem. Phys. 141, 084102 (2014).

[6] Du et al. J. Chem. Phys. 108, 334 (1998).

[7] G.A. Huber and S. Kim. Biophys. J. 70, 97 (1996).

Share

Extended Software Development Workshop: Quantum Dynamics

If you are interested in attending this event, please visit the CECAM website here.

Workshop Description

Quantum molecular dynamics simulations describe the behavior of matter at the microscopic scale and require the combined effort of theory and computation to achieve an accurate and detailed understanding of the motion of electrons and nuclei in molecular systems. Theory provides the fundamental laws governing the dynamics of quantum systems, i.e., the time-dependent Schroedinger equation or the Liouville-von Neumann equation, whereas numerical techniques offer practical ways of solving those equations for applications. For decades now, theoretical physicists and quantum chemists have been involved in the development of approximations, algorithms, and computer softwares, that together have enabled for example the investigation of photo-activated processes, like exciton transfer in photovoltaic compounds, or of nonequilibrium phenomena, such as the current-driven Joule heating in molecular electronics. The critical challenge ahead is to beat the exponential growth of the numerical cost with the number of degrees of freedom of the studied problem. In this respect, a synergy between theoreticians and computer scientists is becoming more and more beneficial as high-performance computing (HPC) facilities are nowadays widely accessible, and will lead to an optimal exploitation of the computational power available and to the study of molecular systems of increasing complexity.

From a theoretical perspective, the two main classes of approaches to solving the quantum molecular dynamical problem are wavepacket propagation schemes and trajectory-based (or trajectory-driven) methods. The difference between the two categories lies in the way the nuclear degrees of freedom are treated: either fully quantum mechanically or within the (semi)classical approximation. In the first case, basis-function contraction techniques have to be introduced to represent the nuclear wavefunction as soon as the problem exceeds 5 or 6 dimensions. Probably the most successful efforts in this direction have been oriented towards the development of the multi-configuration time-dependent Hartree (MCTDH) method [1]. Other strategies are also continuously proposed, focusing for instance on the identification of procedures to optimize the “space” where the wavefunction information is computed, e.g., by replacing Cartesian grids with Smolyak grids [2], and thus effectively reducing the computational cost of the calculation. In the second case, the nuclear subsystem is approximated classically, or semiclassically. Although leading to a loss of some information, this approximation offers the opportunity to access much larger systems for longer time-scales. Various examples of trajectory-based approaches can be mentioned, ranging from the simplest, yet very effective, trajectory surface hopping and Ehrenfest schemes [3], to the more involved but also more accurate coupled-trajectory mixed quantum-classical (CTMQC) [4] and quantum-classical Liouville equation (QCLE) [5]. At the interface between wavepacket and trajectory schemes, Gaussian-MCTDH [6], variational multi-configuration Gaussian (vMCG) [7], and multiple spawning [8] exploit the support of trajectories to propagate (Gaussian) wavepackets, thus recovering some of the information lost with a purely classical treatment. In the case of trajectory-based techniques, the literature provides a significant number of propositions that aim at recovering some of the quantum-mechanical features of the dynamics via appropriately choosing the initial conditions based on the sampling of a Wigner distribution [9].

From the computational point of view, a large part of the calculation effort is spent to evaluate electronic properties. In fact, the nuclei move under the effect of the electronic subsystem, either “statically” occupying its ground state or “dynamically” switching between excited states. Also, the nuclear dynamics part of a calculation becomes itself a very costly computational task in the case of wavepacket propagation methods. Therefore, algorithms for molecular dynamics simulations are not only required to reproduce realistically the behavior of quantum systems in general cases, but they also have to scale efficiently on parallelized HPC architectures.

The extended software development workshop (ESDW) planned for 2018 has three main objectives: (i) build upon the results of ESDW7 of July 2017 to enrich the library of softwares for trajectory-based propagation schemes; (ii) extend the capabilities of the existing modules by including new functionalities, thus giving access to a broader class of problems that can be tackled; (iii) construct links among the existing and the new modules to transversally connect methods for quantum molecular dynamics, types of modules (HPC/Interface/Functionality), and E-CAM work-packages (WP2 on electronic structure).

The central projects of the proposed ESDW, which are related to the modules that will be provided for the E-CAM library, are:
1. Extension of the ModLib library of model Hamiltonians, especially including high-dimensional models, which are used to test and compare existing propagation schemes, but also to benchmark new methods. The library consists of a set of subroutines that can be included in different codes to generate diabatic/adiabatic potential energy surfaces, and eventually, diabatic and nonadiabatic couplings, necessary for both quantum wavepackets methods and trajectory-based methods.
2. Use of machine-learning techniques to construct analytical forms of potential energy surfaces based on information collected along on-the-fly calculations. The Quantics software [10] provides the platform for performing direct-dynamics propagation employing electronic-structure properties determined at several different levels of theory (HF, DFT, or CASSCF, for example). The sampled nuclear configuration space is employed to build a “library” of potentials, that will be used for generating the potential energy surfaces.
3. Development of an interface for CTMQC. Based on the CTMQC module proposed during the Extended Software Develoment Workshop (ESDW) 7, the interface will allow the evolution of the coupled trajectories according to the CTMQC equations based on electronic-structure information calculated from quantum-chemistry packages, developing a connection between the E-CAM WP2 on electronic structure and WP3 on quantum dynamics. Potentially, CTMQC can be adapted to the Quantics code, since the latter has already been interfaced with several electronic-structure packages. Optimal scaling on HPC architectures is fundamental for maximizing efficiency.
4. Extension of the QCLE module developed during the ESDW7 to high dimensions and general potentials. Two central issues need to be addressed to reach this goal : (i) the use of HPC infrastructures to efficiently parallelize the multi-trajectory implementation, and (ii) the investigation of the stochastic sampling scheme associated with the electronic part of the time evolution. Progress in these areas will aid greatly in the development of this quantum dynamics simulation tool that could be used by the broader community.
5. Development of a module to sample initial conditions for trajectory-based procedures. Based on the PaPIM module proposed during the ESDW7, sampling of initial conditions from a Wigner distribution will be adapted to excited-state problems, overcoming the usual approximation of a molecule pictured as a set of uncoupled harmonic oscillators. Also, an adequate sampling of the ground vibrational nuclear wavefunction would ensure calculations of accurate photoabsorption cross-sections. This topic connects various modules of the E-CAM WP3 since it can be employed for CTMQC, QCLE, and for the surface-hopping functionality (SHZagreb developed during the ESDW7) of Quantics.
6. Optimization of some of the modules for HPC facilities, adopting hybrid OpenMP-MPI parallelization approaches. The main goal here is to be able to exploit different architectures by adapting different kinds of calculations, e.g., classical evolution of trajectories vs. electronic-structure calculations, to the architecture of the computing nodes.

The format and organization described here focuses specifically on the production of new modules. Parallel or additional activities, e.g. transversal workshop on optimization of I/O with electronic structure code and possible exploitation of advanced hardware infrastructures (e.g. booster cluster in Juelich), will also be considered based on input from the community.

[1] H. D. Meyer, U. Manthe, L. S. Cederbaum. Chem. Phys. Lett. 165 (1990) 73.
[2] D. Lauvergant, A. Nauts. Spectrochimica Acta Part A 119 (2014) 18.
[3] J. C. Tully. Faraday Discuss. 110 (1998) 407.
[4] S. K. Min, F. Agostini, I. Tavernelli, E. K. U. Gross. J. Phys. Chem. Lett. 8 (2017) 3048.
[5] R. Kapral. Annu. Rev. Phys. Chem. 57 (2006) 129.
[6] G. A. Worth, I. Burghardt. Chem. Phys. Lett. 368 (2003) 502.
[7] B. Lasorne, M. J. Bearpark, M. A. Robb, G. A. Worth. Chem. Phys. Lett. 432 (2006) 604.
[8] M. Ben-Nun, J. Quenneville, T. J. Martínez. J. Phys. Chem. A 104 (2000) 5161.
[9] J. Beutier, D. Borgis, R. Vuilleumier, S. Bonella. J. Chem. Phys. 141 (2014) 084102.
[10] Quantics. A suite of programs for molecular quantum dynamics. http://stchem.bham.ac.uk/~quantics/doc/
[11] PaPIM. A code for calculation of equilibrated system properties (observables). http://e-cam.readthedocs.io/en/latest/Quantum-Dynamics-Modules/modules/PaPIM/readme.html

Share

High Throughput Computing Workshop

E-CAM is organising a one week (16-20 July 2018) Extended Software Development Workshop in Turin, Italy that will focus on intelligent high throughput computing (HTC) as a technique that crosses many domains within the molecular simulation community in general and the E-CAM community in particular. The workshop will be a hybrid learning/coding event targeted at scientists with particular problems to solve. There will be 3 days of tutorial content presenting 3 different task management frameworks and 2 days code development time with the framework developers to help you integrate them into your application. Continue reading…

Share

Extended Software Development Workshop: Meso and multiscale modeling

If you are interested in attending this workshop, please visit the CECAM website bellow.

Share

Extended Software Development Workshop: Classical Molecular Dynamics

If you are interested in attending this workshop, please visit the CECAM website bellow.

Share

Extended Software Development Workshop: Quantum MD

If you are interested in attending this workshop, please visit the CECAM website bellow.

Share

Extended Software Development Workshop: Meso and multiscale modeling

If you are interested in attending this workshop, please visit the CECAM website below.

Share

Extended Software Development Workshop: Trajectory Sampling

This is the 3rd of E-CAM’s extended software development workshops; this one on the theme of trajectory sampling.

Share

Extended Software Development Workshop: Wannier90

The aim of the workshop is to share recent developments related to the generation and use of maximally-localised Wannier functions and to either implement these developments in, or interface them to, theWannier90 code. It will also be an opportunity to improve and update existing interfaces to other codes and write new ones. The format will be deliberately open, with the majority of the time allocated for coding and discussion.

Share

Electronic Structure Library Coding Workshop

This is the first E-CAM Extended Software Development Workshop, taking place in Zaragoza in Spain. The Electronic Structure Library  is a new project to build a community-maintained library of software of use for electronic structure simulations. The goal is to create an extended library that can be employed by everyone for building their own packages and projects.

Share