The Curse of Dimensionality in Data-Intensive Modeling in Medicine, Biology, and Diagnostics

With Prof. Tim Conrad (TC), Free University of Berlin, and Dr. Donal Mackernan (DM), University College Dublin.

Abstract

Until recently the idea that methods rooted in statistical physics could be used to elucidate phenomena and underlying mechanisms in biology and medicine was widely considered to be a distant dream.  Elements of that dream are beginning to be realized, aided very considerably by machine learning and advances in measurement, exemplified by the development of large-scale biomedical data analysis for next-generation diagnostics. In this E-CAM interview of Tim Conrad,  the growing importance of diagnostics in medicine and biology is discussed. One difficulty faced by such developments and shared with particle-based simulation is the “curse of dimensionality”. It is manifest in problems such as: (a) the use of a very large number of order parameters when trying to identify reaction mechanisms, nucleation pathways, metastable states, reaction rates; polymorph recognition (b) machine learning  applied to electronic structure  problems – such as neural network based potentials need very high dimensional basis sets; (c) systematic coarse-graining would ideally start with a very high dimensional space and systematically reduce the dimension.  The opportunities and challenges for scientists engaging with industry are also discussed. Tim Conrad is Professor of “Medical Bioinformatics” at the Institute of Mathematics of the Free University of Berlin and head of MedLab, one of the four laboratories of the Modal research campus. MODAL is a public-private partnership project which conducts mathematical research on data-intensive modeling, simulation, and optimization of complex processes in the fields of energy, health, mobility, and communication.  Tim Conrad is also the founder of three successful start-up companies.

In this E-CAM interview with Prof. Tim Conrad, the growing importance of diagnostics in medicine and biology is discussed, including concepts rooted in signal analysis relevant to systematic dimensional reduction, and pattern recognition, and the possibilities of their application to systematic coarse-graining. The opportunities and challenges for scientists of start-up companies are also discussed based on experience.

 

Continue reading…

Share

Scientific reports from the 2018 E-CAM workshops are now available on our website

 

The scientific reports* from the following workshops conducted in year 3 of the project E-CAM (2018):

  1. E-CAM Scoping Workshop: “Solubility prediction”, 14 – 15 May 2018, Ecole Normale Supérieure de Lyon, France,
  2. E-CAM Scoping Workshop: “Dissipative particle dynamics: Where do we stand on predictive application?”, 24 – 26 April 2018, Daresbury Laboratory, United Kingdom,
  3. E-CAM Extended Software Development Workshop 11: “Quantum Dynamics”, 18 – 29 June 2018, Maison de la Simulation, France,

are now available for download on our website at this location. Furthermore, they will also be integrated in the CECAM Report of Activities for 2018, published every year on the website www.cecam.org.

 

*© CECAM 2018, all rights reserved.

Please address any comments or questions to info@e-cam2020.eu.

Share

Improving I/O of DL_MESO_DPD files using SIONlib

 

This module implements the SIONlib library to optimize the I/O (writing/reading) of the trajectory files generated by DL_MESO_DPD, the Dissipative Particle Dynamics (DPD) code from the DL_MESO package. SIONlib is a library for writing and reading binary data to/from several thousands of processors into one or a small number of physical files. For parallel access to files, only the open and close functions are collective, while the writing and reading of files can be done asynchronously. [1] In DL_MESO_DPD’s last release (version 2.6), the MPI version of DL_MESO_DPD generates multiple trajectory files, one for each MPI task. The interface with SIONlib optimizes the data writing so that just one physical file is produced from several MPI tasks. This drastic reduction in the number of output files is a benefit for the I/O of the code, and simplifies the maintenance of the output, especially for a large number of MPI tasks.

This module is part of the newly developed utilities for the DL_MESO_DPD code within the pilot project on Polarizable Mesoscale Models.

Practical application and exploitation of the code

The implementation of this module generates a single trajectory file (history.sion) in a parallel run of DL_MESO_DPD, instead of multiple (HISTORY) ones. Accordingly, analogous modifications have to be implemented in the post-processing utilities that read the HISTORY files. As an example, the changes were implemented in a formatting utility. Besides showing how to adapt the reading, this allows a robust check of the implementation, since the output is human readable, contains the full trajectories, and can be readily compared with outputs obtained using the standard version of DL_MESO_DPD.

The next released version of DL_MESO_DPD (in development) will tackle the writing of files differently, producing a single trajectory file from the start. However, the interface proposed here provides this feature to the users of version 2.6, and represents an alternative solution for the handling of the trajectories.

It should be noted that this implementation is meant to show the feasibility of the interfacing, not to deal with all the possible cases. Thus, the module’s functionality is restricted to the relevant case in which: i) the simulation is run in parallel using MPI, ii) a single SIONlib-type physical file is produced, and iii) the post-processing is done by a single process.

While SIONlib is optimized for a large number of MPI tasks, even the reduction from several output files to just one represents a benefit, for example when it comes to the maintenance of the simulation output.

 

[1] http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/SIONlib/_node.html

Share

Extended Software Development Workshop: Atomistic, Meso- and Multiscale Methods on HPC Systems

If you are interested in attending this event, please visit the CECAM website here. This a multi-part event and we indicate the date for the first meeting. Dates of follow ups are decided during the first event.

Workshop Description

E-CAM is an EINFRA project funded by H2020. Its goal is to create, develop, and sustain a European infrastructure for computational science, applied to simulation and modelling of materials and biological processes that are of industrial and societal interest. E-CAM builds upon the considerable European expertise and capability in this area.

E-CAM is organized around four scientific areas: Molecular dynamics, electronic structure, quantum dynamics and meso- and multiscale modelling, corresponding to work packages WP1-4. E-CAM gathers a number of groups with complementary expertise in the area of meso- and multiscale modeling and has also very well established contact to simulation code developers. Among the aims of the involved groups in this area is to produce a software stack by combining software modules, and to further develop existing simulation codes towards highly scalable applications on high performance computer architectures. It has been identified as a key issue that simulation codes in the field of molecular dynamics, meso- and multiscale applications should be prepared for the upcoming HPC architectures. Different approaches have been proposed by E-CAM WPs: (i) developing and optimizing highly scalable applications, running a single application on a large number of cores and (ii) developing micro-schedulers for task-farming approaches, where multiple simulations run each on smaller partitions of a large HPC system and work together on the collection of statistics or the sampling of a parameter space, for which only loosely coupled simulations would be needed. Both approaches rely on the efficient implementation of simulation codes.

Concerning strategy, most of modern parallelized (classical) particle simulation programs are based on a spatial decomposition method as an underlying parallel algorithm. In this case, different processors administrate different spatial regions of the simulation domain and keep track of those particles that are located in their respective region. Processors exchange information (i) in order to compute interactions between particles located on different processors, and (ii) to exchange particles that have moved to a region administrated by a different processor. This implies that the workload of a given processor is very much determined by its number of particles, or, more precisely, by the number of interactions that are to be evaluated within its spatial region.

Certain systems of high physical and practical interest (e.g. condensing fluids) dynamically develop into a state where the distribution of particles becomes spatially inhomogeneous. Unless special care is being taken, this results in a substantially inhomogeneous distribution of the processors’ workload. Since the work usually has to be synchronized between the processors, the runtime is determined by the slowest processor (i.e. the one with highest workload). In the extreme case, this means that a large fraction of the processors is idle during these waiting times. This problem becomes particularly severe if one aims at strong scaling, where the number of processors is increased at constant problem size: Every processor administrates smaller and smaller regions and therefore inhomogeneities will become more and more pronounced. This will eventually saturate the scalability of a given problem, already at a processor number that is still so small that communication overhead remains negligible.

The solution to this problem is the inclusion of dynamic load balancing techniques. These methods redistribute the workload among the processors, by lowering the load of the most busy cores and enhancing the load of the most idle ones. Fortunately, several successful techniques are known already to put this strategy into practice (see references). Nevertheless, dynamic load balancing that is both efficient and widely applicable implies highly non-trivial coding work. Therefore it has has not yet been implemented in a number of important codes of the E-CAM community, e.g. DL_Meso, DL_Poly, Espresso, Espresso++, to name a few. Other codes (e.g. LAMMPS) have implemented somewhat simpler schemes, which however might turn out to lack sufficient flexibility to accommodate all important cases. Therefore, the present proposal suggests to organize an Extended Software Development Workshop (ESDW) within E-CAM, where code developers of CECAM community codes are invited together with E-CAM postdocs, to work on the implementation of load balancing strategies. The goal of this activity is to increase the scalability of these applications to a larger number of cores on HPC systems, for spatially inhomogeneous systems, and thus to reduce the time-to-solution of the applications.

The workshop is intended to make a major community effort in the direction of improving European simulation codes in the field of classical atomistic, mesoscopic and multiscale simulation. Various load balancing techniques will be presented, discussed and selectively implemented into codes. Sample implementations of load balancing techniques have been done for the codes IMD and MP2C. These are highly scalable particle codes, cf. e.g. http://www.fz-juelich.de/ias/jsc/EN/Expertise/High-Q-Club/_node.html. The technical task is to provide a domain decomposition with flexible adjustment of domain boarders. The basic load balancing functionality will be implemented and provided by a library, which will be accessed via interfaces from the codes.

In order to attract both developers of the codes as well as postdocs working within E-CAM the workshop will be split into 3 parts:

Part 1: preparation meeting (2 days)
– various types of load balancing schemes will be presented conceptually and examples of implemented techniques will be shown
– code developers / owners will present their codes. Functionalities will be presented and parallel implementations are discussed in view of technical requirements for the implementation of load balancing techniques
– an interface definition for exchanging information from a simulation code to a load balancing library will be set up

Part 2: training and implementation (1 week)
– to enable E-CAM postdocs to actively participate in the development, some advanced technical courses on MPI and high-performance C++ will be offered in combination with the PRACE PATC course program at Juelich
– during and after the courses (planned for 2-3 days), participants can start implementing a load balancing scheme into a code
– for those participants who are already on an expert level in HPC techniques, it is possible to start immediately with implementing load balancing schemes

Part 3: implementation and benchmarking (1 week)
– final implementation work with the goal to have at least one working implementation per code
– for successful implementations benchmarks are conducted on Juelich supercomputer facilities

The second part will also be open for a broader community from E-CAM, so that the workshop can have an impact on the HPC training of postdocs in E-CAM, which will strengthen their skills and experience in HPC.

It is intended that between the face-to-face parts of the workshop, postdocs and developers continue the preparation and work on the load balancing schemes, so that the meetings will be an important step to synchronise, exchange information and experience and improve the current versions of implementation.

Share

New publication is out: “Adaptive Resolution Molecular Dynamics Technique: Down to the Essential”

 

A new publication by the Theoretical and Mathematical Physics in Molecular Simulation group of the Freie Universität Berlin, lead by Prof. Luigi Delle Site, E-CAM partner, was published in the Journal of Chemical Physics. In it, the authors study the application of the thermodynamic force in the coupling region of an adaptive resolution molecular dynamics simulation (AdResS) approach which assures thermodynamic equilibrium and proper exchange of molecules between atomistically resolved and coarse-grained regions.

The publication post-print version is open access and can be downloaded directly from the Zenodo repository here. The publisher AIP version can be found at https://doi.org/10.1063/1.5031206.

This work was performed in the context of the E-CAM pilot project on the development of the GC-AdResS scheme, which is a collaboration with MODAL AG. One of its goals is to develop a library or recipe with which GC-AdResS can be implemented in any MD Code. The current focus is to adjust the implemented version of GC-AdResS in GROMACS. The long-term goal of this project is to promote and stimulate the community to use it as a tool for multiscale simulations and analysis. More information about this pilot project can be found here.

Article

Title: Adaptive Resolution Molecular Dynamics Technique: Down to the Essential

Authors: Christian Krekeler, Animesh Agarwal, Christoph Junghans, Matej Praprotnik, Luigi Delle Site

Abstract: We investigate the role of the thermodynamic (TD) force, as an essential and sufficient technical ingredient for an efficient and accurate adaptive resolution algorithm. Such a force applied in the coupling region of an adaptive resolution Molecular Dynamics (MD) set-up, assures thermodynamic equilibrium between atomistically resolved and coarse-grained regions, allowing the proper exchange of molecules. We numerically prove that indeed for systems as relevant as liquid water and 1,3-dimethylimidazolium chloride ionic liquid, the combined action of the TD force and thermostat allows for computationally efficient and numerically accurate simulations, beyond the current capabilities of adaptive resolution set-ups, which employ switching functions in the coupling region.

Share

E-CAM Case Study: The implementation of a hierarchical equilibration strategy for polymer melts, to help studying the rheological properties of new composite materials

Dr. Hideki Kobayashi, Max-Planck-Institut für Polymerforschung, Germany

Abstract

The ability to accurately determine and predict properties of newly developed polymer materials is highly important to researchers and industry, but at the same time represents a significant theoretical and computational challenge. We have developed a novel multiscale simulation method based on the hierarchical equilibration strategy, which significantly decreases the equilibrium properties calculation time while satisfying the thermodynamic consistency. A number of E-CAM modules was developed and implemented in he ESPResSo++ software package.

Continue reading…

Share

Coarse-Graining module, a Component of the Hierarchical Equilibration Strategy for Polymer Melts

To study the properties of polymer melts by numerical simulations, equilibrated configurations must be prepared. However, the relaxation time for high molecular weight polymer melts is huge and increases, according to reptation theory, with the third power of the molecular weight. Hence, an effective method for decreasing the equilibration time is required. The hierarchical strategy pioneered in Ref. [1] is a particularly suitable way to do this. The present module provides a part of that method.

To decrease the relaxation time, microscopic monomers are coarse-grained (CG) by mapping each subchain with N_{b} monomers onto a soft blob. The CG system is then characterized by a much lower molecular weight and thus is equilibrated quickly. The present module provides a python script which performs this coarse-graining procedure. The implementation details can be seen in the module’s documentation on our software Library here. This module is part of a set of codes that together implement the Hierarchical Equilibration strategy of Ref. [1], in the ESPResSO++ [2] (for the complete list of modules, see here under ESPResSO++).

 

Practical application and exploitation of the code

The development of a multiscale method for polymer blends and block copolymers is fundamentally new and needs to be based on first-principles theory. This is therefore an intellectual challenge in its own right. Furthermore, this paves the way to analyze the physical properties of novel composite materials that attract the attention of industrial companies. Such materials may be promising ingredients of new products like e.g. efficient and environment-friendly car tires. The implementation of the Hierarchical Equilibration strategy in the ESPResSO++ package is a step towards achieving this goal. In particular,  the practical application of this strategy is the E-CAM pilot project in collaboration with Michelin aimed at studying the Rheological Properties of New Composite Materials.

E-CAM deliverables D4.2 and D4.3 contain more information on the suite of programs developed under this pilot project.

 

[1] Zhang, G., Moreira, L. A., Stuehn, T., Daoulas, K. C., and Kremer, K., Equilibration of High Molecular Weight Polymer Melts: A Hierarchical Strategy, ACS Macro Lett., 3, 198-203 (2014)

[2] ESPResSo++ is the “Extensible Software Package for Research in Soft Matter based upon C++”, a general-purpose simulation package for soft-matter research, mainly developed at the Max Planck Institute for Polymer Research Mainz. It is freely available under the GNU Public License. http://www.espresso-pp.de/

Share

Scientific reports from the 2017 E-CAM workshops, are now available on our website

 

The scientific reports* from the following workshops conducted in year 2 of the project E-CAM (2017):

  1. E-CAM Scoping Workshop: “From the Atom to the Material” , 18- 20 September 2017, University of Cambridge, UK,
  2. E-CAM State-of-the-Art Workshop WP4: Meso and Multiscale Modelling, 29 May – 1 June 2017, University College Dublin, Ireland,

are now available for download on our website at this location. Furthermore, they will also integrate the CECAM Report of Activities 2017, published every year on the website www.cecam.org.

Each report includes:

  • an overview of the remit of the workshop,
  • the workshop program,
  • the list of attendees,
  • the major outcomes,
  • how these outcomes relate to community needs,
  • how the recommendation could be funded,
  • and how they relate to society and industry,
  • emphasis and impact on software development.

 

*© CECAM 2017, all rights reserved.

Please address any comments or questions to info@e-cam2020.eu.

Share

New publication using the GC-AdResS molecular dynamics technique

 

The publication “Probing spatial locality in ionic liquids with the grand canonical adaptive resolution molecular dynamics technique (GC-AdResS) by the Theoretical and Mathematical Physics in Molecular Simulation group of the Freie Universität Berlin, lead by Prof.Luigi Delle Site, E-CAM partner, describes the use of the GC-AdResS molecular dynamics technique to test the spatial locality of the ionic liquid 1-ethyl 3-methyl imidazolium chloride liquid. The main aspect of GC-AdResS is the possibility to couple two simulation boxes together and combine the advantages of classical atomistic simulations with those from coarse gained simulations.

The publication post-print version is open access and can be downloaded directly from the Zenodo repository here. The publisher AIP version can be found at http://aip.scitation.org/doi/10.1063/1.5009066.

E-CAM currently runs a pilot project on the development of the GC-AdResS scheme and one of its goals is to develop a library or recipe with which GC-AdResS can be implemented in any MD Code. The current focus is to adjust the implemented version of GC-AdResS in GROMACS. The long-term goal of this project is to promote and stimulate the community to use it as a tool for multiscale simulations and analysis. More information about this pilot project can be found here.

Article

Title: Probing spatial locality in ionic liquids with the grand canonical adaptive resolution molecular dynamics technique

Authors:  B. Shadrack Jabes, C. Krekeler, R. Klein and L. Delle Site

Abstract: We employ the Grand Canonical Adaptive Resolution Simulation (GC-AdResS) molecular dynamics technique to test the spatial locality of the 1-ethyl 3-methyl imidazolium chloride liquid. In GC-AdResS, atomistic details are kept only in an open sub-region of the system while the environment is treated at coarse-grained level; thus, if spatial quantities calculated in such a sub-region agree with the equivalent quantities calculated in a full atomistic simulation, then the atomistic degrees of freedom outside the sub-region play a negligible role. The size of the sub-region fixes the degree of spatial locality of a certain quantity. We show that even for sub-regions whose radius corresponds to the size of a few molecules, spatial properties are reasonably reproduced thus suggesting a higher degree of spatial locality, a hypothesis put forward also by other researchers and that seems to play an important role for the characterization of fundamental properties of a large class of ionic liquids.

The Journal of Chemical Physics 148, 193804 (2018)
Share

First GPU version of the DL_MESO_DPD code

DL_MESO_DPD, is the Dissipative Particle Dynamics (DPD) code from the mesoscopic simulation package DL_MESO [1], developed by Dr. Michael Seaton at Daresbury Laboratory (UK). This open source code is available from Science and Technology Facilities Council (STFC) under both academic (free) and commercial (paid) licenses. E-CAM’s Work-package 4 (WP4), Meso and Multi-scale Modelling, makes use of the DL_MESO_DPD code. See this article on our news feed, for more information on how it is used within E-CAM.

In order to accelerate the DL_MESO_DPD code on the latest and future exascale hardware, a first version for NVidia GPUs has been developed. This is only a starting point, it does not yet cover all the possible cases and it does not yet support multiple GPUs. However, it represents an HPC milestone for the application, complementing the already present parallel versions developed for shared and distributed memory (MPI/OpenMP).

Module documentation including purpose, testing and background information, can be found here. The GPU-version to CPU-version performance analysis can be found in the module documentation and in deliverable D7.2.: E-CAM software porting and benchmarking data I, recently submitted to the EU.

[1] Michael A. Seaton, Richard L. Anderson, SebastianMetz, andWilliamSmith. DL_meso: highly scalable mesoscale simulations. Molecular Simulation, 39(10):796–821, September 2013.

Share