#### With Prof. Christoph Dellago (CD), University of Vienna, and Dr. Donal Mackernan (DM), University College Dublin.

#### Abstract

*Recently there has been a dramatic increase in the use of machine learning in physics and chemistry, including its use to accelerate simulations of systems at an ab-initio level of accuracy, as well as for pattern recognition. It is now clear that these developments will significantly increase the impact of simulations on large scale systems requiring a quantum level of treatment, both for ground and excited states. These developments also lend themselves to simulations on massively parallel computing platforms, in many cases using classical simulation engines for quantum systems.*

*DM: The idea that we should organise an E-CAM review on the use of neural network potentials to greatly accelerate the simulation of systems at an effectively ground state QM level came when I got an email from Nvidia claiming 3 orders of magnitude acceleration over standard DFT[1, 2]. **However, I now know that you have been following this track for some time. Can you tell me about how you started working on this?*

CD: So we actually started to think about this when we were trying to simulate some phase transitions in nano-rods made of copper sulfide observed in some experiments in the group of Paul Alivisatos at UC Berkeley. They asked us if we could do some simulations for them to understand the mechanism of this transition. They change shape at some point when they are heated under the beam of an electron microscope. The first thing we tried to do was to find a potential that works for a copper sulfide. The problem was we didn’t find one. We tried many things and nothing worked. We were not able to reproduce the structures of this material and certainly not the phase transitions so at some point I bumped into this paper by Jörg Behler and Michele Parrinello [3] where they modelled potential energy surfaces with neural networks and that’s how it started with us. I got in touch with Behler, who is really the person who pushed this technology to make it useful; he’s now at the University of Göttingen. I sent Andreas Singraber, a very talented doctoral student, over to his group. So he went over to Jörg to learn this technique and while we were discussing copper sulphide we also had this idea to look into water. Eventually the thing that we want to do is to study formation and breaking of chemical bonds in aqueous systems and that’s how we started to think about really developing a neural network potential for water [4].

*DM: I think you actually had a little exploration long before with Phil Geiger [5] in the context of the polymorph work?*

CD: Actually it was not before, it was about the same time, you know when you are thinking about something and suddenly realize that everything is related, that you can use one idea in other contexts too. So I was also at the time thinking about simulating the nucleation of ice in of water. You need to be able to detect local structures, whether a certain molecule is in a crystalline environment or in a liquid environment and with water that’s notoriously difficult. So my simple idea was that if there is enough information in the input functions of these neural networks to predict energies, there should be enough information to characterize and classify crystalline structures [5]. As you know there are many crystalline structures of ice numbered from 1 to 17 and they are also some different amorphous structures of ice. So we wanted to use the same idea of Behler and Parrinello to detect local structures and to tell the difference between cubic and hexagonal structures, from ice II and ice III and ice IV and so on. It turned out that we were doing much better than other methods developed to classify locally crystalline structures.

*DM: Was that in 2012 or 2013?*

CD: It must have been around then. I think the really important advance in this whole field was to realize that one cannot work with Cartesian coordinates. Rather the first step one needs to do is to calculate some symmetry functions that are fingerprints of structures that contain enough information to compute the energy of these structures. They incorporate some important symmetries, like they are invariant with respect to rotations or translations of a piece of material or the exchange of atoms of the same species. This was really the advance of Behler and Parrinello that made it possible to use neural networks to model condensed phases.

*DM: Can you give us an intuitive idea why these fingerprint functions work?*

CD: What you need to do in this neural network methodology is to teach the neural network to predict the energy and forces acting on molecules based on the configurations of the atoms, i.e. where the atoms are. Now we describe the position of atoms of course with Cartesian coordinates but for a neural network it’s very difficult to learn energies from Cartesian coordinates because if you shift a material as a whole by a certain amount that does not change the energy of course, but it does change all the Cartesian coordinates. The same thing happens if you rotate a block of material. So to help the neural network, you need to compute properties that depend only on the relative positions of atoms and angles. You construct these functions, the so-called symmetry functions in such a way that they do not change if you do operations like the rotation or translation of material. Now you have to define enough of these functions such that the neural network has enough information to predict the energy for this local conformation. But at the same time you don’t want to compute too many of them. Why? Because it costs. You don’t want to spend more time than is needed to allow you to recognize the structure.

*DM: I’m not an expert in neural networks but I have read a bit about them, general culture and so on, for example the use of decision trees and classifiers. In that case one asks questions about a set of features, teaches the network to classify a set of structures, and then use it to classify unknown structures according to whether they belong to one category or another. Is it possible to explain the symmetry functions you have in this sense?*

CD: Well I think it is pretty much like that except there is a continuum of structures because atoms move due to the thermal motion of the material. Atoms fluctuate around so it’s not enough to know that the average position of the atoms belongs to certain crystalline structure. You really have to take into account where the atoms are specifically and so there are many possible configurations the atoms can take and the question then that you ask is what is the energy of this local arrangement of atoms, and the neural network is trained to predict the energy. So it’s not a yes or no question, it’s a question of 0.5 or 0.6, 3 or 10 or 29. You just really want to know the energy of this arrangement of atoms. The neural network is nothing else than a very complicated and very flexible fitting function that depends on many parameters. You know that Von Neumann once said that given four parameters I can fit an elephant and if I have another parameter I can make its tail wiggle. So this is basically what the neural network does, it’s a really flexible fitting function which depends on thousands of parameters, these are the so called weights and are adjusted in a way that the function represented by the neural network produces the energy as a function of these features, because the symmetry functions are nothing else than the features that go into the neural network as an input, and based on those, the neural network outputs the energy and also the forces that act on the atoms. Now the question is how does the neural network know what these parameters should be? In fact your network does not know, so you always need some reference configurations for which you know the energy, which you have computed for instance using ab-initio simulations. You then adjust the parameters of the neural network such that your network gives you the right energies. The hope is that after training, it also gives the right energies for configurations that it has not learnt or seen before. But really there is no physics in the neural network itself, it’s all learned from a set of reference data. That’s really the power of the network, that’s what makes it so flexible. You don’t go in with a preconceived notion of how the potential energy should look like as a function of the positions of the atoms.

*DM: Last year we were at an E-CAM state of the art workshop in molecular simulation in Leiden. There was a speaker who gave a great presentation on deep learning applied to image recognition. One thing that he mentioned was that in deep learning there are hidden motifs that the network learns, and somehow uses in image recognition. And if you actually look at these motifs and the images, you realize they make sense. Is there is something analogous here?*

CD: I think that’s a very interesting question. It would be very interesting to try to understand how the information is stored in the neural network. I think what you are referring to is a talk where one of the subjects was automatic image recognition. There you can go and look into the neural network and see which kind of features actually let the neural network distinguish between a dog and a cat. You can go into the neural network and look into different levels, hierarchies of how the information is stored. Perhaps one could do also something similar for the neural networks used to represent potential energy surfaces. Maybe the information is stored at different levels of coarse graining, who knows. I don’t know but I think it would be an interesting question to analyze these networks. Right now we use them as a black box because simulations done at the ab-initio level are often too expensive for the things that we want to do. We are interested in understanding phase transitions, nucleation, understanding the dynamics and structure of nano crystals. They often consist of thousands of atoms that need to be followed for a long period in time and for these purposes ab-initio simulations are just too expensive, so that’s why we try to use neural networks to learn the information from expensive ab-initio simulations, and then use the neural networks that are much cheaper to carry out longer simulations of bigger systems. One criticism that I think is valid, is that the neural network gives an answer, but you don’t know how the answer comes about in a way that you can understand. So looking a bit more closely into this I think would be an interesting question.

*DM: In your polymorph recognition paper with Phil Geiger [5] you used the symmetry functions of Jörg Behler and Parrinello paper as input functions, but found it was necessary to include additional symmetry functions to characterize the different polymorphs that you were seeing. How difficult is it to design these input functions? Are they obvious or are they rather subtle, and do you have to suffer a great deal before you find them?*

CD: Actually, there are different types of symmetry functions, some of them are more sensitive to radial distributions, some are more sensitive to angular patterns. Typically you need 30 to 40 symmetry functions to characterize a structure. Now regarding symmetry functions, you have put your finger really on an important point, because this is really the art of the whole procedure. Once you have found symmetry functions that work the rest is more or less automatic. So really the effort goes into devising a sufficient but not too large set of symmetry functions that contain enough information to be able to predict the energy from this set of numbers. Jörg Behler and Michele Parrinello apparently came up with a few symmetry functions that worked very well. If you apply them to different materials maybe you have to change the parameters a little, but that’s usually it. One thing that one could try to do more systematically is to find methods to find symmetry functions. The SOAP approach of Gabor Csanyi is an important step in this direction and related work is also going on in the group of Gerbrand Ceder. Maybe one could learn also the symmetry functions such that one can reduce their number because the cost of such a calculation is mainly the cost of computing them, so having a better, more systematic approach in defining them would be very useful. Right now this is where most work goes.

*DM: At an intuitive level if you look at a material like say a metal as opposed to a semiconductor or insulator in terms of the symmetry functions, are there any obvious differences or would you not be able to tell one from the other?*

CD: I don’t think there are any basic fundamental differences between the symmetry functions for these different types of materials. They all look very similar. They mainly depend on how you select your parameters such that the symmetry functions capture changes in the typical distances and angles that atoms have, that they are able to capture the typical structures of these materials. There is no general rule that says for metals you should use these types of symmetry functions, for semiconductors you need others.

*DM: Are there other systems where you suspect that you might have a huge difficulty with this approach? I guess your symmetry functions are, if I understand correctly, locally defined, so if you had long-range correlations in a system, might it conspire to make life difficult, or this is not really a relevant issue?*

CD: Well of course the network approach that we use is local in the sense that it looks for a particular set of atoms; it looks only into a neighbourhood of a certain size. Now it’s not a problem to include also long-range forces in principle. For instance, you can easily include Coulombic interactions but you need to do that basically on top of the network. You can add the Coulombic forces to the network. There are some technical issues that you need to solve if you do that but it’s certainly possible in principle. Jörg Behler and his group they have done that. So it’s easy to combine the neural network approach with long-range Coulomb forces. Now, if there are other correlations that are not captured by Coulombic interactions, then perhaps you might have a problem, but you will see if you play with the cut-off radius that you won’t get the same result.

*DM: I guess what you say for long range Coulombic interactions also applies for Van der Waals interactions, considering that it is included in the second paper [4]. We know that for some systems longer range interactions are much more important than people realized before, but then again you can in principle treat it using particle mesh Ewald, basically doing the same sort of thing for either interaction?*

CD: Absolutely. In the paper that we wrote we did not do that, but in principle of course it could be done. You could compute the Van der Waals forces and the Coulomb interaction with particle mesh Ewald, and subtract that from the part that the neural network then does on a local level. We know that the long-range component of Van der Waals interactions is really important if you look at interfaces. The value of the surface tension for instance depends strongly on whether you include the long-range part of the Van der Waals interactions. We did not do that in this paper that you are mentioning.

*DM: Let me go to another question. You were looking at polymorphs in the in the context of water. **How about pharmaceuticals?*

CD: Well, for pharmaceuticals one knows that some substances can exist in a number of different polymorphs, and their activity, their solubility for instance, and how well they are absorbed in the body, depends on the polymorph itself. So in the context of pharmaceuticals it is very important to be able to understand how a certain polymorph forms rather than another, and neural networks could certainly be used to study the formation of these polymorphs, to detect local structures at a very early stage, to understand how these polymorphs nucleate.

*DM: Can one do something analogous in experiment in the context of pattern recognition as you did for polymorphs?*

CD: That’s an interesting question. You would have to have the ability to produce a number of local probes that depend on this structure, and that have sufficient space and time resolution. I think that’s difficult. In the context of colloidal systems on the other hand you have all the information you need to determine structures. There, whether the information comes from a simulation or from an experiment doesn’t matter. In both cases you could apply neural networks to detect structures. Whether there are really ways to do this I don’t know.

*DM: I have another question related to biased sampling. Is it possible to use your neural networks and associated symmetry functions to explore the different free energy structures of a system in the sense of biased sampling? In other words, could you construct some form of biased potential that would drive the system to visit these different structures?*

CD: Certainly. You can do something very simple like in the paper we did with Philip Geiler you mentioned. You can force the system to crystallize by applying a bias on the size of the largest crystalline chunk of the material that exists in your system, where the size of the crystalline piece is determined using neural networks. So whatever you can do with other methods to bias a simulation to go towards regions that haven’t been explored previously, you can do also by using the neural network as a collective variable because in the end it gives you some collective description of the system. That is, the collective variable is a function of the position of many atoms and you can certainly apply a bias defined with respect to it. The question is just how do you pick these collective variables in a way that pushes the system to explore important parts of configuration space. In principle it can certainly be done and it has been done already. Whether we can go beyond this, and use neural networks or other machine learning approaches to recognize when some new structure has been found, that’s a different story. Maybe that’s another possibility to apply machine learning tools to detect on the fly whether your system has crossed some important barrier and opened up a gate into a new part of configuration space that hasn’t been visited before.

*DM: That is intriguing. I am now going to go to your neural network paper including Van der Waals interactions. Why is it important to add dispersion?*

CD: Well, people have realized quite some time ago that dispersion interactions are important. In fact this was the reason why people started developing corrections to functionals in density functional theory, in particular general gradient approximation functionals, GGA, as they have the problem that they’ve not included Van der Waals interactions because they miss important correlations between charge fluctuations. So some time ago people have realized that Van der Waals interactions are important in many condensed matter systems, and that they are not captured by these functionals and they have started to develop corrections. One correction that it is used a lot is the so-called D3 correction of Stefan Grimme, and this is also the correction that we used [4]. What’s new in our paper is that we were able to basically do simulations at this level of theory for much bigger systems and much longer times, because we translated the information of density functional theory plus Van der Waals corrections into a neural network potential that is much easier for large systems in simulations. So for instance, we were able to compute the density of water as a function of temperature for a particular pressure, and to look for the density maximum. Everybody knows that water has a density maximum at four degrees Celsius and if you try to find out whether your model of water reproduces this maximum, you will find out if you do it with DFT that a number of functionals do not reproduce this important feature of water. You will also find out if you add Van der Waals interactions that the density maximum appears. We were also able to compute the melting point of ice. That’s another thing that is not trivial to do with density functional theory just because simulations are very expensive but with the neural network that basically contains the same information as the density functional, we were able to do a calculation of the melting point and we found that melting point predicted using Van der Waals interactions is actually quite close to the experimental value. More importantly the difference in density between ice and water has the right sign. Ice floats on water so that means the density of ice needs to be below that of water. If you do not include Van der Waals interactions in your description then you get an ice that sinks and that’s not good. But with Van der Waals interactions you have at least the right sign even if the densities are not reproduced perfectly. Now where does this all come from? Of course we know (and this is not something that we found out this is something that was known before) that the strength of hydrogen bonds plays a very important role for these anomalies of water. It turns out that hydrogen bonds need to have just the right strength. They shouldn’t be too strong or too weak in order to reproduce the density maximum. Where does the density maximum come from? Usually a substance when you heat it expands, but water in the range of temperature between zero and four degrees shrinks. So how can that be? If you look at it more closely, you find out that if you heat water and imagine that you sit on a water molecule, the neighbours immediately close to it will move away but at the same time this makes the structure around the molecule weaker because the hydrogen bonds weaken and then water molecules from the second shell can come closer, leading to an overall increase of density so everything shrinks a bit. This is what produces the density maximum water. This can happen only if the hydrogen bonds have just the right strength. Namely, they need to weaken up a bit in order to let another molecule from the second shell to come in, and this is also the reason that ice is less dense than water. This is also something that we studied using these neural network potentials once with Van der Waals interactions, and once without.

*DM: So in effect the hydrogen bonding impedes the packing?*

CD: That’s right. This is something that people have known before using empirical potentials. But from ab initio it had not been really possible so far to look into these problems just because the calculations are so expensive, and with a neural network potentials in between you can go to rather large systems and long times. But for us the really most important motivation of developing neural network potentials for water was not to look into the importance of Van der Waals interactions. Rather we wanted to be able to look at the breaking and formation of hydrogen covalent bonds that governs for instance processes like proton transfer and the auto-ionization of water molecule in the liquid which leads to the pH of water. These processes are not possible to study using standard empirical potentials because they assume that the water molecules have a certain geometry and they never break. There are empirical potentials which include bond breaking and making, but it’s actually difficult to make them work right. That was our motivation for developing neural network potentials for water. Right now we’re actually looking at proton transfer processes, that is the diffusion of OH^{– }ions and excess protons, and eventually we will also like to look at the auto-ionization of water.

*DM: So the implications of this work for transport properties in water could be pretty significant?*

CD: We hope that we will be able to study these processes in a statistically exhaustive way. You can look at these processes with ab-initio simulation but it’s hard to collect enough data that allow you to really see what’s going on in detail. By being able to simulate these systems including chemical reactions for a longer times larger systems we hope to be able to collect enough information to look into these processes in detail.

*DM: A couple of years ago or so Michele Parrinello and co-authors did a beautiful paper on proton transport in water – I think they called it the gossamer effect [7]. I don’t work in the area myself but I came across the paper and was fascinated by it. The Grotthus mechanism takes place there which is very important for proton transport including, for example in electrolytes in fuel cells, or for that matter in a biological context. So how important do you think your work will be there, given the very significant corrections to densities and so on that you have found in your study of water. Presumably the effects there could be pretty significant?*

CD: Yes, I would assume so just because simply the distance is changed. With different densities you have different distances between molecules, and hopping rates might really change. Whether it really changes mechanisms I don’t know. We will find out.

*DM: How about then for protein water systems, water is everywhere in biology, hydrogen bonding plays a huge role often within proteins, and of course our between proteins and water. So what do you think are the possible implications of these Van der Waals corrections?*

CD: Well the empirical potentials that people use to model such systems contain Van der Waals interactions, and the parameters are selected such that Van der Waals interactions are at least in an effective way taken into account. It’s just that the ab-initio description often lacks the inclusion of Van der Waals interactions so I don’t expect to really see a lot of change with respect to results obtained with empirical force fields.

*DM: Would it be possible to replace these empirical force fields with neural network potentials? Presumably they can work on the same scale as empirical methods, and yet be rigorously based to the extent that the ab-initio methods (with Van der Waals corrections) are correct?*

CD: There is an important limitation of the current version I should say of the neural network approach, and that is that the computational effort grows very quickly with the number of involved species. Why? Because you have a large number of possible arrangements of atoms of different species, and you really run into a combinatorial problem of the number of possible configurations, and that also determines the number of configurations that you need in the training phase of the potential that grows exponentially with the number of species. So doing a pure system consisting of one single element is rather easy. Two elements like the case of water can be done and is not difficult. Three hard, four very hard and beyond I don’t know. Jörg Behler has worked on ternary systems, and quaternary systems were already very difficult. So without doing an important simplification, this approach cannot be used in a straightforward way to simulate biological systems because there you cannot get away just by having two or three types of atoms. This is an important limitation. Maybe one could exploit that not all environments that could in principle exist actually occur, just because they have very unfavourable energy, so you don’t need to take them into account in the training, but this will require some effort, and some serious thinking. Gerbrand Ceder and his group recently made progress on this issue and have devised symmetry functions that take also the composition into account and allow to go to a larger number of chemical species.

*DM: How about if I were to use an empirical force field for say the protein and a neural network potential for the water. Would this work if suitable adjustments were made to your potential?*

CD: Well, you would still have to model the interactions of the protein and water with an empirical potential or whatever you want to put into the water. Also the neural network is still more expensive than an empirical model, let’s say like TIP4P which is much less expensive than a neural network model, although the neural network model is much cheaper than an ab-initio simulation so a neural network is somewhere in between. It will not be possible to simulate very large system containing hundreds of thousands of water molecules with the neural network potential. If it can be done with an empirical potential then I recommend to do that. Only if there is something that you cannot do with an empirical potential, like a chemical reaction, does use of a neural network potential give a real advantage.

*DM: You said that you need currently to have for every species a different set of symmetry functions. If we want to look at say a protein system. We know what that the predominant things we are going to get will be hydrogen’s and oxygens which you already have, and carbon and nitrogen and a few other things, so it is almost a quaternary system. I don’t know it is an important point or not, is it so hard?*

CD: To work with just have four elements, it can be done. Right now we are we’re looking into making a potential for carbon and oxygen and hydrogen because we want to model carbon dioxide in water at high pressures and temperatures and from there adding nitrogen is hopefully a small step and then one could try to do it. But the issue is always the specific question of what you want to address, and if it’s something that you can do it with an empirical force field, then there is no need to use a neural network potential, which is expensive to develop. It’s not something that you can do automatically. A person needs to sit and spend a considerable amount of time developing the neural network potential, and then also the calculation itself once you have it is more expensive than the calculation with an empirical potential. So it should be clear before they start doing this what it is that they want to learn.

*DM: In this ANI paper by Smith et al [2] they claim that their calculation runs something like three orders of magnitude faster than DFT, and they claim they don’t need to limit the scope of their code to two species. I do not think they included Van der Waals corrections. I think there were other limitations but I can’t remember what they were but it’s something worth looking at. Another question would be that you built your neural network potentials using various forms of DFT functionals including Van der Waals interactions. But of course there are more accurate ab-initio approaches such as coupled cluster and of course quantum Monte Carlo?*

CD: Absolutely

*DM: And so presumably given that the basic concept of the neural network potentials is the mathematical formulation of the symmetry functions and how they are built in the context of the neural network, you could just as well use them for a coupled cluster or quantum Monte Carlo?*

CD: Absolutely right. Where you get your new reference data from the neural network doesn’t care, so if you have good quantum Monte Carlo data I think that would be great, and you can use the neural network to learn from these data. I think this is the way to go. It’s just that for many systems quantum Monte Carlo is still expensive. You have to consider that to train the neural network you need thousands of energies, maybe ten thousand or more energies for configurations that should be as different from each other as possible. So it’s not a trivial thing to do and that’s why for the water case we did the calculation using two different density functionals RPBE and BLYP, but including the Van der Waals corrections [6]. If you have the computational power to do automatically similar calculations for the system of interest then that’s great and that’s the way to go. It’s important to realize that the neural network does not add anything to the reference data. It inherits all the properties of the reference system, this it does very well because it’s so flexible and it can reproduce what you teach it very well, but it does not add anything so anything the reference system gets wrong, your network work will get wrong. In the best cases the neural network reproduces the property of the reference system exactly, of course there’s always a small error or difference, but experience shows that this difference is small.

*DM: We should try to talk about exascale. I guess from what you just mentioned that if you have a massive machine you can expect to do a lot of calculations independently in parallel, that is in the sense of being embarrassingly parallel. Do you imagine that this might lend itself very well to training a neural network to use more accurate methods like coupled cluster, quantum Monte Carlo?*

CD: So talking about high performance computing and about parallel computation, my student Andreas Singraber has implemented a version of the neural network which is now included into LAMMPS [8]. So you can use the parallelism of LAMMPS to do neural network calculations. You can actually do the parallelization on two levels. You can distribute the calculation of symmetry functions on different processors or cores, and on a higher level you can distribute the energy computation involving different local domains also onto different cores. We have also developed a parallel training program so the training can be done in parallel. Something that we have not done systematically yet is to look at how the whole procedure scales. Can we obtain the scaling that will bring us into the exascale regime or not? This is a question that we need still to answer. But in principle I think that the whole method is very suitable for parallelization.

*DM: I guess one thing about the learning scheme one needs to know is when the job is divided up onto different processors, to what extent the different processors need to talk or communicate to each other. Whether you need to have simultaneous updates or calculations on different cores can be done independently of each other?*

CD: That’s what we do. But the more important question is the execution of the neural network, when you use it to compute forces for molecular dynamics.

*DM: How does it compare with empirical force field, once the neural network has is trained?*

CD: The neural network can be slower, say up to a factor of a thousand.

*DM: Is that because you have not had the chance to do the level of optimization that has occurred for empirical force fields, or is it for a different reason? *

CD: Well, you just have to calculate a lot of these symmetry functions. There are many, and you have to do it for every atom. Perhaps there are ways to do it more efficiently. Right now we have put more effort into making it work. Certainly it is worth trying to make it work more efficiently, but we will never be able to get it to work as fast as TIP4P or mW of water. It’s just that the number of operations needed to evaluate empirical potentials is much less. In the case of the neural network, first of all you have to compute the symmetry functions, which depend on angles and distances, and they need to be summed up over all combinations of neighbours. Then you need to do all the operations that go into the evaluation of the neural network once you have computed the symmetry functions, and these are many operations to you need to do for every atom. The good news is that it scales linearly with system size. It’s just that the number of operations you need to do to evaluate the neural network for every atom is much larger than the number of operations you need to evaluate for an empirical potential.

*DM: Is it possible to coarse grain the water?*

CD: Coarse graining is another thing that we could try to with a neural network. Perhaps you can do coarse graining including many-body effects rather than writing everything in terms of pair potentials, and then looking for pair potentials that reproduce the properties of the system. One could try to do a more systematic approach and encode effective interactions into a neural network.

*DM: I saw that there were a couple of papers on excited states citing the ANI paper by Smith et al using similar approaches, so to what extent can neural networks potentially be used for excited states?*

CD: Actually here at my university, the University of Vienna, there is a group in chemistry Leticia Gonzalez and Philipp Marquet, and they actually already did it. They are developing neural networks like ours, also in collaboration with Jörg Behler, but to model excited states. So in principle there is no fundamental difference if you have the required training data then you can train a network to reproduce the energy surface belonging to an excited state. On top of that of course you need to have some prescription of how to move from one surface to another.

*DM: I’m going to ask my final question, which in a way we have already touched on. How should E-CAM exploit these advances do you think?*

Well, in Work Package 1 of E-CAM we care about molecular dynamics simulations, and of course that’s a wide field, so we concentrate on rare events and on free energy calculations. In both of these cases neural network potentials offer some possibilities that we have already pointed in our article. You can use them to model a substance and since you can also describe the breaking and making of chemical bonds we can use these potentials to look at chemical reactions, and then use the tools that we developed in the work package 1 such as open path sampling tools to study their mechanism and kinetics. I don’t know if the neural network really gives completely new possibilities but you can certainly use the neural network potentials in combination with the tools that we developed in E-CAM. It’s also an interesting connection to other work packages of E-CAM because in developing a neural network you need reference data, and this is of course something that the electronic structure work package can provide.

*DM: Great. Listen, I’m delighted with this very interesting conversation we have had. Thank you.*

#### Bibliography

[1] *AI Quantum Breakthrough: ANAKIN-ME* Nvidia Webinar www.e-cam2020.eu/ai-quantum-breakthrough-anakin-me-webinar/

[2] JS Smith,O Isayev,AE Roitberg, *ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost, *Chem. Sci. 8, 3192-3203 (**2017**)

[3] Jörg Behler,Michele Parrinello, *Generalized neural-network representation of high-dimensional potential-energy surfaces, *Phys. Rev. Lett. 98, 146401 (**2007**)

[4] Tobias Morawietza, Andreas Singraber , Christoph Dellago , and Jörg Behler, *How van der Waals interactions determine the unique properties of water,* PNAS 113 (30) (**2016**)

[5] Philipp Geiger, and Christoph Dellago, *Neural networks for local structure detection in polymorphic systems,* J. Chem. Phys. 139, 164105 (**2013**)

[6] Grimme S, Antony J, Ehrlich S, Krieg H *A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements* H-Pu, J. Chem. Phys. 132(15):154104 (**2010**).

[7] Ali Hassanali, Federico Giberti, Jérôme Cuny, Thomas D. Kühne, and Michele Parrinello, *Proton transfer through the water gossamer, *PNAS 110(34) (**2013**)

[8] LAMMPS http://lammps.sandia.gov/