Machine Learning: Theoretical Fundamentals and Its Possibilities for Pharmaceuticals and Life Sciences
Artificial intelligence, machine learning and predictive analytics are terms coined numerously by IT- and industry experts. It is presented as the main strategic, technological development that progressive companies should strive to implement throughout their range of activities, and within the upcoming years. Propelled by the facilitation of operating platforms with integrated AI components by major tech companies such as Google, SAP, IBM and Amazon (AWS), the time is now for IT Professionals to be proactively investigating what benefits these technologies can bring to either their or their customer’s business. According to the International Data Corporation (IDC), artificial intelligence spending within corporate environments should encompass $77.6B of worldwide spending in 2022 (Forbes, 2019). Especially in the domains of maintenance, human resources, procurement and marketing the developments are numerous.
However, our daily practice learns us that the majority of companies that operate within pharmaceutical environments, do not apply any machine learning component within their business yet. Even though the opportunities are abundant, the reserved attitude towards new technologies, inherent to GMP industries, is one of the general causes for this industry to fall behind in the AI domain. We typically see that it is quite difficult to apprehend these technologies, to identify where the possibilities lie for creating value, and how to jump-start the implementation of artificial intelligence technologies, of course whilst still adhering to imposed regulations.
In this blog series we will try to answer these questions from our perspective as an SAP consultancy firm with a strong focus in the GMP industry. By laying focus on these technologies and their applicability, in combination with tacit knowledge about the industry, we hope to give a holistic view of the possibilities for applying these new technologies within pharmaceutical companies. The series consists out of three episodes, each with a distinctive subject that will guide you on your machine learning journey. This part will focus on the fundamentals of artificial intelligence and the requirements for identifying opportunities for a use-case.
In blog 2 of this series we will reveal how to develop this potential use-case for predictive analytics, and how to demonstrate its effectiveness via a proof of concept. The final episode (3) of the series will discuss what ingredients are required to take the step and implement AI models within a pharmaceutical environment and where to think of when formulating an AI strategy.
Buzzword Bingo – AI, Predictive Analytics and ML
Back to the terms from our introduction. So, what is the difference between all these fancy buzzwords?
First of all let’s discuss Artificial Intelligence. Artificial Intelligence gives an IT component the ability to ‘think and learn’, something which we associate with human capabilities. It is the encompassing term of all technologies that aim at making an IT component intelligent, hence its name. AI is comprised out of various subdomains that each focus on a different aspect. From basic prescriptive analytics, to reinforced prediction models and ending up with the sophisticated deep-learning algorithms that for example can give a camera cognitive capabilities such as face recognition. However, the definition artificial intelligence is quite broad and quite often leads to more confusion than clarity on the actual purpose of the subject at matter.
Already more specific is the term Predictive Analytics’. These are analytical tools which typically support companies in making decisions on the basis of ‘what might happen next’. By integrating statistical formula’s into reporting functionalities, probability calculations can be made with the help of various techniques. PA looks at what in- and output variables are identified in depending relationships in order to select a statistical model. Then, by ingesting input variables, the model can calculate the statistical chance of the occurrence of a certain pre-determined event. This reveals that these predictive models can actually never predict the real future, but only give a possibility of what might happen. Furthermore, these models can be perfectly static without improvising over time.
Machine Learning tackles this static aspect. It is a branch within the predictive analytics domain which is dedicated to the development of self-reinforcing algorithms. Self-reinforcing algorithms allow predictive models to evolve over time, which is exactly where it differs from predictive analytics in general. Machine learning models can adapt when the volume or accuracy of the data increases. On the basis of these new insights statistical calculations give an IT component the possibility to ‘learn’. A basic example could be the shift in averages or variances applied in probability calculations.
Key in generating accurate predictive models is having extensive datasets containing both in- and output variables, or respectively named independent and dependent variables. These inputs and outputs are required in the process called modelling. This is something we all have done in high-school: analyzing input and outcome variables in order to determine the equation (or formula) of their relationship. Predictive modelling actually encompasses the same, however instead the datasets are typically significantly larger than the datasets used in high school. By analyzing and processing vast amounts of collected in- and output data, a model can be generated which allows to ‘predict’ outcomes on the basis of inputs with the help of statistical probability calculations. Nonetheless, please be aware that the identified relationship among variables not always make sense. Tacit process knowledge is required in order to validate relationships between in- and output variables.
The outcome of modelling is an actual model type (or equation as put in previous terms). Various statistical techniques can be used in the modelling aspect, depending on the technique that best fits the dataset. A typical methodology is applying regression. Both linear and logistic regression techniques are essential algorithms for the identification of relationships among variables in large datasets. IBM, who is a technological leader in the AI domain, provides more in-depth knowledge on statistical modelling techniques.
Technological Drivers for Predictive Analytics and Machine Learning
As already stated in the introduction, the current technological advances makes this very moment the right time for the inception of applying predictive analytics or even machine learning in a business context. Therefore, it is important to know what the exact drivers are, since these also provide the basis for datasets that allow (improvement of) predictive modelling.
First of all the beginning of the Internet of Things era can be identified as a driving force. A significant amount of business processes that can be reinforced with AI are sensor-driven. Sensors are able to autonomously collect input data used for the ingestion in the predictive model, therefore these devices are well-suited to be utilized in predictive solutions. The IoT hype sparked an ever-increasing manufacture of both smarter and less expensive sensor modules. We see countries such as China effectively making use of economies of scale to drive down sensor prices, allowing to increase the amount of sensors in a certain environment for the collection of additional data. Previously costly equipment used for analyzing purposes can now be offered for a fraction of the original price because of technological developments. New technologies also give the possibility to develop more sophisticated sensor modules. Think of sensors that can measure variables that could not be measured before, or newer sensor technologies that allow to analyze a set of variables faster, more accurate and at smaller intervals.
Another technological advance that triggered the machine learning boom is the development of database systems and cloud hosting. Larger databases, smarter infrastructures, external hosting possibilities and faster technologies such as SAP’s in-memory HANA database are all key in ingesting, calculating and presenting data at ever-increasing speeds. Techniques like hot- and cold-storage of data, hosting possibilities via cloud platforms and big data frameworks such as Hadoop now support the possibility of processing vast amounts of unstructured data from various sources, which are key in the development of predictive models. The less lag between analyzing the data and presenting the results, optimizes the effectiveness of the decision taken. Something which can be illustrated by your typical OLAP systems with a single daily data update, leading up to decision-making on already outdated information.
Finally, at the machine learning end we also see rapid movements. Especially due to the high-level of interest by tech giants such as IBM, AWS and Google in this domain, we see the progress of increasing user-friendliness of predictive tooling. Development tools such as Tensorflow and PyCharm are exciting examples that even allow drag-and-drop-like modelling, which minimizes the requirement for hardcore data scientists in the data modelling process. These tech giants also provide more dedicated predictive models available out-of-the-box as SaaS via cloud platforms. A need which SAP perfectly understands, illustrated by standardly providing basic predictive models within its Predictive Maintenance and Service (PdMS) solution.
Predictive Analytics in Pharmaceutical Companies
What makes a highly controlled and regulated industry like life sciences the right place to implement such technologies?
The most obvious cause is the fact that quality is paramount within this industry. Because quality is of critical importance within GMP environments, pharmaceutical companies tend to have specific methods on how to measure and control the quality of their final products. Also, these companies are keen on keeping environmental/influencing parameters, such as required in cleanrooms, under control. A common technique to keep track of both product and environmental parameters is sampling, which can either be a completely manual exercise (such as product sampling by hand) or partially/fully automated (e.g. environmental monitoring systems).
Since sampling is the process of taking a subset of the population in order to check on the composition of relevant quality parameters, a lot of valuable information is gathered over the years of sampling. So, on one hand we have the sample results as our outcome variables. On the other hand there is already an extensive amount of instrumentation present on production equipment, utility systems and regulated areas (e.g cleanrooms and warehouses) in the form of sensors. These sensors collect the data that represent the input variables. Quite often the input and outcome data is collected over longer periods of time are already present at pharmaceutical plants, therefore we regard the database systems containing this data as a gold-mine for predictive modeling.
Based upon this available data, predictive analytics will allow you to apply predictive quality management. By putting both input (sensor information, lab data, maintenance information) and output (sample outcomes) in a time-series database, the dataset will allow you investigate correlations among the independent and dependent variables: predictive modelling. Based upon the identified relationships among variables, a variety predictive models can be generated. From simple anomaly detection algorithms, looking at the composition of variances across variables, to more dedicated techniques such as multivariate regression. Consequently AI can play a crucial role in the early identification of problems across input variables, which eventually allows you to ‘predict’ the future quality of the output.
Identifying AI Opportunities
In order to be successful in identifying opportunities for the application of AI in your company, you should be well aware of various factors:
- Know your data. Your current data, that lies distributed in various database systems, can be a treasure for predictive modelling. Think of sample outcomes and sensor recordings, the input and outcome data which are both required for the generation of a predictive model. However also be aware on potential pitfalls; what is the quality of these datasets? Are the results coming from trustworthy (e.g. calibrated) instrumentation? Are there any holes across time?
- Understand the various models and their applications. In order to determine if the available data can be put to use, an overall idea of the possible models should be present. See it is a puzzle, without knowing the pieces it is impossible to understand how the final picture should look like. If you know what ingredients are needed for the generation of a predictive model, you are able to understand what possible models can be applied in a specific business scenario.
- Be aware of technological advances. On the database end we see rapid improvement in ingestion, storage and processing techniques, all required to successfully analyze the data and calculate outcomes via the predictive model. Advances in the sensor domain allow for inexpensive instruments, faster and more accurate sampling and last but not least, allow to monitor on variables that previously could not be measured. These newer technologies can provide the final puzzle piece required to build an effective ML solution.
- Dare to investigate. It can be hard to get a proper idea of the added value of implementing a machine learning model. Therefore I encourage to apply design thinking and clever use of a proof of concept to determine added value, before triggering a costly and time-consuming full-scope project.
So much for the theoretical part. Next blog we will take a more practical approach, where we will take you on our endeavor in applying a machine learning solution in a pharmaceutical context. With the help of the theoretical foundation laid in this blog part, we will explain how we identified an opportunity and show how a proof of concept identified the potential value which can ultimately provide the basis for applying machine learning in real-life GMP business contexts.