A possible roadmap for the coarse graining and multiscale simulation community

A community-driven review with contributions from E-CAM “Unfolding the prospects of computational (bio)materials modeling” has just been published in the Journal of Chemical Physics on the history, developments, and challenges facing coarse graining (CG) and multiscale simulation (MS) and a set of recommendations on how the latter may be addressed.

A community-driven review “Unfolding the prospects of computational (bio)materials modeling” has just been published in the Journal of Chemical Physics on the history, developments, and challenges facing coarse graining (CG) and multiscale simulation (MS) and a set of recommendations on how the latter may be addressed. The perspective emerged in part from a two-week school and workshop including some 35 experts in this area hosted by the Lorenz Center in the Netherlands.

These challenges are both present in an academic setting, where day-to-day issues range from data exchange, conversion and storage, validation and reproducibility, and the availability of tested and efficient simulation and parameterization tools (SPTs) , but particularly also to the industrial end-users, who suffer difficulties in benefitting from these advances due to unfamiliarity with the underlying concepts, difficulties in the extraction and interpretation of available CG data, and lack of commercially available SPT. To tackle these challenges, the CG/MS communities could be well served by taking a more systematic advantage of the (lessons learned in the) molecular dynamics (MD) community. One obvious link is in the choice between Python/C/C++ for interfacing, knowing that Python is becoming a preferred choice in the MD community. The lesson of PLUMED, might be one to emulate for CG/MS, i.e., to aim at providing a large variety of CG methods in a library that runs on multiple engines. Such a strategy may at the same time facilitate improved performance on large scale machines for CG simulations and of parameterization efforts, possibly also involving machine Learning (ML) techniques, by better exploitation of massive parallelism.

In particular, the authors make the following recommendations (for details, see the full publication):

Develop and adopt an ontology of CG models and workflows, taking the existing EMMC/EMMO framework as a starting point. Consequently, make this standard available to the community as a useful tool for documenting computational results in scientific publications.
Set up an identification system for heterogeneous simulation data to ease data extraction.
Select one flexible data format, for instance, the H5MD format, and define a rewarding system that stimulates the common use of this format.
Define and adopt a framework for assessing quality, based on verification, validation, and uncertainty quantification. Validation should concern thermodynamic, dynamics, kinetics, average structure-dependent properties, or structures, with an emphasis on the kinds of properties that are intended to be reproduced. Several measures are needed to cover this heterogeneous modeling domain.
Invest in better validation and education. The proof that the CG/MS methodology can provide (at least qualitatively) relevant results can only be readily given by experts that are actively involved in the development of such methodologies. Making this investment will also have an important educational effect. In combination with easing the access to state-of-the-art CG methodology and data, it will generate a larger user group in academics and industry and strengthen the position of the modeling community as a whole.
Define general rules for data storage, keeping in mind that it may sometimes be more efficient to re-simulate data if input parameters files are provided, either through publication or a database, and as long as the versioned, benchmarked simulation engines with back-functionality are freely available. A system of rules can also be exploited for improving and easing data management plans.
Set up and maintain databases for massive storage of heterogeneous simulation data. As the necessary manpower and, thus, funding will rely on proving the huge benefit of such a database, all stakeholders should be involved.
Introduce a DOI identifier for code development.

Reference

G. J. Agur Sevink, Jozef Adam Liwo, Pietro Asinari, Donal MacKernan (E-CAM) , Giuseppe Milano, and Ignacio Pagonabarraga (E-CAM), Unfolding the prospects of computational (bio)materials modeling , J. Chem. Phys. 153, 100901 (2020); https://doi.org/10.1063/5.0019773 (Open access version in Zenodo here)