In the margins of a recent multiscale simulation workshop a discussion began between a prominent pharmaceutical industry scientist, and E-CAM and EMMC regarding the unfolding Fourth Industrial Revolution and the role of particle based simulation and statistical methods there. The impact of simulation is predicted to become very significant. This discussion is intended to create awareness of the general public, of how industry 4.0 is initiating in companies, and how academic research will support that transformation.
Authors: Prof. Pietro Asinari (EMMC and Politecnico di Torino, denoted below as PA) and Dr. Donal MacKernan (E-CAM and University College Dublin, denoted below as DM) , and a prominent pharmaceutical industry scientist (name withheld at author’s request as the view expressed is a personal one, denoted below as IS)
DM: What is the Fourth Industrial Revolution
IS: This is the prediction that modern manufacturing impelled by “Industry 4.0” 1 will transform into smart factories in which the whole production process, from research and development to finish and fill is dominated and run autonomously by connected machines.
PA: This has still a far way to come. I would say that, in the smart factories of the future, the production process will be partly integrated, with connected machines taking some autonomous decisions.
IS: Perhaps, but companies like UniteLabs already focus on fully automated laboratories, for example. In addition, daily work in the offices will also change, as most analogue processes will be replaced by their digital counterparts. Moreover, knowledge and data will be stored, maintained, protected and distributed via modern web-based platforms. In consequence, most workflows will be crucially affected by the smart usage, interpretation and evaluation of large amounts of data in terms of artificial intelligence, machine learning approaches as well as digital transformations. Furthermore, advanced simulation approaches in terms of digital twins, mechanistic models, hybrid models and integrated process models will significantly increase in importance and will contribute to the efficient design and development of processes and products.
PA: Digitalization of the office work may also bring some unexpected changes, because, once fully digitalized, this work could be also extensively delocalized.
IS: Delocalisation as well as work at home office are important but not the key effect. Of course, cost reduction is important and many companies employ other cheap and smaller companies, especially for IT and consulting services. However, what really drives the innovation is time. Thus, pharmaceutical companies always aim to accelerate the “time to the market” or “time to clinics”. Whatever makes our processes faster, more reliable, more transferable or more stringent is highly appreciated. Further examples such as natural language processing tools for reports or digitalised storage and maintenance of standard operating procedures (SOPs) are more important and will bring much more value in the future.
DM: How seriously does industry consider the fourth revolution, and how is it impacting their current activities and future plans?
IS: Manufactures recognise the importance of digital transformation, but most are just starting their first initiatives in this direction. A few others, mainly small and more agile start-ups already use digital approaches like machine learning and artificial intelligence as common use and business cases. As a consequence, the implementation of roadmaps and efficient strategies for digitalisation activities is considered as one of the hottest and most urgent topics in most companies. Thus, many data scientists are being hired to support and facilitate this transformation. They are motivated by the obvious benefits such as more efficient and cheaper processes and workflows, but also by the emergence of novel competitors in the field of pharmaceuticals. For instance, Google recently announced their interest in drug production and drug design whereas Samsung already installed first biopharmaceutical production sites.
PA: This is true for some start-ups, but not for all small medium enterprises (SMEs) which cannot always invest a lot of money in digitalisation. Hence SMEs need also some specific support from public agencies for embracing the digitalisation.
IS: With regard to your comment, I slightly disagree. If small or medium enterprises will not invest and innovate their digital strategies, they will disappear. It is a must, specifically in the pharmaceutical sectors. Novartis recently claimed that they are a “data processing company”.
DM: What are the likely societal benefits, for example, in the context of pharmaceutical and health science industries and personalised medicine, or in the context of advanced materials or manufacture?
IS: Personalised drugs, smart wearables, digital health care as well as intelligent devices for drug injection are attracting a lot of interest in pharmaceutical industry. In addition, the pre-screening of potential diseases like cancer in terms of machine learning approaches is extremely important. These initiatives will definitely help the patients in their daily lives or protect them from diseases. In the future, the efficient design of drugs with the help of computational methods will help to fight Alzheimer, COPD, AIDS and cancer among others.
PA: Do you mean here Artificial Intelligence or Machine Learning?
IS: Whatever you can imagine: Neural Networks, genetic algorithms, QSAR methods, etc. Moreover, the use of simulation approaches for biopharmaceutical processes will also lower the amount of necessary experiments and provides a higher process robustness. Thereby, a new level of drug quality and a faster time to clinics will occur, reducing the time for the release of drugs as well as their costs for development.
DM: What are the commercial advantages that companies that are adopting it anticipate– in terms of drug discovery, or in more reliable and cheaper processing?
IS: Usually, the attrition rate is quite high, such that most potential drugs in development never make it to the patient. A clever use of computational models is therefore highly appreciated. Moreover, it is also well known, that a faster time to clinical use in the first product phase significantly lowers the cost of drug development. Hence, companies spend a lot of money to get an understanding of production processes and the implementation of reliable computer models. Besides the costs, also regulatory agencies enforce the use of digital approaches, which is usually achieved by an explicit knowledge of the underlying processes. As a specific example, the FDA motivates pharmaceutical companies to combine results from simulation models with experimental data in order to demonstrate process robustness, thereby fulfilling the principles of “quality by design”. Hence, the use of simulations definitely lowers regulatory hurdles.
DM: How much data comes from experiment/clinical data, and how much currently comes from simulation, and how should data be generated and stored (so that it can be safely, easily and reliably accessed)?
IS: The corresponding ratio between experimental and simulation data is hard to be estimated. In research, computational screening methods and homology modelling are common and important tools. In contrast, drug formulations in combination with purification steps are rather harder to simulate. However, due to recent digital initiatives as supported by the FDA, the amount of simulation data will significantly increase in the next years. A recent initiative, which is widely accepted and supported in industry and academia claims the FAIR principles to be the most promising. FAIR stands for the Findable, the Accessible, the Interoperable and the Reproducible storage of data. Hence, individual data sets from different sources must be harmonised/standardised in their format, which then also fosters the exchange of data between departments within companies and between the companies.
DM: What are data ontologies, and their role in the definition of standards in data I/O, standards in data format, and data storage requirements?
IS Ontologies can be interpreted as an efficient and standardized glossary for the unique deposition of data sets under specific descriptors. As an example, entries of temperature can be deposited under Temp, temperature, T, etc. A standardized descriptor like TEMPERATURE can be regarded as unique definition in close analogy to SI units. With standards from ontologies, the exchange, the storage and the maintenance of data sets from different sources in one data bank is significantly simplified.
PA: True, but what about the time and cost for developing such ontologies?
IS: If the problem is of high value, pharmaceutical industry does not care about cost and time. Usually it takes 12 years and billions of dollars to bring a drug to the market. If this time can be accelerated by using well-maintained data banks, FTE costs become less relevant. Moreover, such problems are usually outsourced to specialist companies (like Ontoforce or SciBite), so that their own employees can focus on other problems.
DM: However the relative importance of such costs would be more significant for small and medium-sized companies, or for that matter, where the value of the market is significantly smaller.
DM: Are current machine learning methods already good enough to analyse such huge data sets?
IS: Definitely. At the moment, we can load, evaluate and process petabytes of data. Common examples are business cases from companies like Google or Amazon. Interestingly, Google already published its open source software package TensorFlow, which is a fast growing suite of relevant and efficient machine learning routines, thereby enhancing the use of machine learning in all industrial sectors. However, meaningful machine learning algorithms rely on testing and validation stages. After sufficient training, decisions and outcomes can be obtained within extremely short timescales. If data storage or the amount of data is limited, the algorithms may be significantly reduced in their efficiency.
DM: Various forms of simulation/modelling are performed. Can you tell us a bit about them, and current and possible future role of molecular/mesoscopic or multiscale simulation?
IS: Molecular or particle-based models are not often used in upstream and downstream processes include various production and purification steps by physical mechanisms. This can be related to the high complexity of process steps including filtration, centrifugation or chromatography. Furthermore, the simulated time and length scales are often too long for atomistic models. Most often, continuum approaches, so called mechanistic models or combinations of mechanistic models with machine learning, so called hybrid models are currently used. Furthermore, also computational fluid dynamics solved by finite-element techniques and lattice-Boltzmann techniques are growing in their importance. Particle-based simulations are mainly used in research, in particular, the design of drugs. In fact, particle-based or quantum-mechanical considerations for new drugs are already well established in research. For an optimal choice of experimental conditions, approaches like the design of experiments and advanced statistical analysis methods are also becoming significantly importance.
PA: By advanced statistical analysis methods, do you mean uncertainty quantification (UQ)?
IS: Among others, I meant Bayesian design of experiments, as well as active machine learning.
DM: How can massive computer parallelism best support this future role? What is more important, the possibility to simulate very large systems, or high throughput computing for moderately sized systems?
IS: Parallel computing as well as GPU clusters foster the fast and efficient simulation of systems. This becomes specifically of importance for computational fluid dynamics. In research, atomistic simulations as well as DFT calculations can be significantly accelerated. Today, the high throughput screening is more important, but in my opinion, future development processes may also rely on the simulation of large systems for troubleshooting aspects.
DM: What aspects of simulation need to be improved?
IS: Improvement is important in all aspects. The efficient and fast evaluation of algorithms as well as the development of novel approaches will become of emerging importance. Moreover, refined classical, polarisable, coarse-grained and reactive force fields will also attract the interest of industrial researchers
DM: How can one facilitate training of the people needed to generate, maintain and extract meaning from large data-sets?
IS: A lot of well-trained and educated data scientists are hired from university. However, the main idea is that also normal employees can work with advanced approaches by using intuitive graphical user interfaces. Hence, specialists will be trained during their academic education or as Ph.D. students, whereas the intuitive handling of routines for standard employees is one of the major challenges in the future.
1 “Industry 4.0” alludes to the concept of advanced factories of the future in which machines are augmented with wireless connectivity and sensors, connected to a system that can visualise the entire production line and make decisions on its own.