Sylvia Dyballa - 06 September 2024
Data Science in the Drug Discovery Industry
How Does ZeClinics Take Advantage of Data Science?
This is a story about the role of data science in accelerating the drug discovery process, with real examples from company. We are Sylvia Dyballa—Technology Director—and Javier Terriente—Chief Innovator Officer and Founder at ZeClinics—and we would like to share how we use data science and artificial intelligence in the biotech industry.
First of all, we want to introduce the important biological and experimental advantages that the zebrafish has over other preclinical models, which can be summarised in two main benefits: biological translatability and big data acquisition possibilities. The combination of both places zebrafish in a sweet spot for implementing data science (DS) and artificial intelligence (AI) tools. We employ DS/AI to automate and accelerate phenotypic and omic analyses. Moreover, DS/AI can be applied to discover new biological paradigms to tackle disease. Based on those potential applications, ZeClinics is implementing AI from different angles.
Data Science Use Case 1: Deep learning for automated phenotyping
Deep learning for automatic segmentation of the zebrafish heart chambers
We are continuously generating Deep Learning (DL) models to achieve case-specific phenotypic data analysis. An emblematic example is ZeCardioAI, a tool that allows automatic segmentation of the atrium and ventricle of zebrafish larvae hearts from videos.
ZeCardioAI allows extrapolating changes in area, from each segmented region, across the duration of the video. Therefore, we can extract, in an automated manner and without human bias, how the heart beats. This is translated into disease relevant parameters concerning heart rhythm — BPM, arrhythmias, etc . — but, also, potential contractility defects — chamber size, ejection fraction, strain defects, etc.
Figure 1 Heart chamber segmentation. After training a DL model to segment the atrium (green) and the ventricle (blue) we can predict those structures in the video and extract heart physiological parameters such as the heart rate, arrhythmias, ejection fraction, etc.
Those phenotypes serve:
- To quantify the impact of cardiotoxic drugs
- To analyze cardiomyopathy disease models
- To validate and discover new therapeutic targets and drugs to treat cardiac related diseases
Deep learning for semantic segmentation of zebrafish anatomical entities of interest
Another example of a DS/AI powered tool, which has been developed in ZeClinics, is related to the developmental toxicity field. Here we are assessing the effect of different molecules on zebrafish development – their teratogenic effect. In this context, different models were trained on thousands of manually curated images to achieve semantic segmentation of regions of interest (morphometrics) or image classification.
The following example shows zebrafish larvae image segmentation. Dorsal (top view) and lateral (side view) images are fed to a Deep Learning architecture. The model was pre-trained on the COCO dataset and fine-tuned on ZeClinics’ data. This project was carried out in collaboration with the Data Science master program of the Polytechnic University of Barcelona (UPC).
Data Science Use Case 2: Machine learning for HTS of candidate drugs
Another use of AI in ZeClinics is the use of Machine Learning (ML) to build classifiers that allow predicting the possible toxicity of compounds, when larvae are incubated with potentially toxic drugs. This approach is based on the assumption that some toxicity-related relationships are only accessible by identifying causalities hidden in large and complex data sets. These relationships are usually inaccessible to the human experimenter without the implementation of advanced mathematical models.
As such, we train our ML algorithms on sets of phenotypes extracted from hundreds of experimental samples with known toxicity. Once trained and validated those classifiers can be used to predict toxicity for new compounds. An example is shown below. Here we are interested in understanding if a drug promotes teratogenicity, i.e. is the potential to promote defects in the fetus, if exposed to compounds. In this particular case, dorsal and lateral images of zebrafish larvae incubated with potentially toxic compounds are fed to a ResNet-101–based classifier. The model was pre-trained on the ImageNet dataset and fine-tuned on ZeClinics’ data. Once trained and validated, the model predicts a 0 to 1 confidence score for each phenotype on the basis of which we assign toxicity.
Data Science Use Case 3: Knowledge Graphs (KG) for therapeutic target discovery
Yet another use of AI in our company is related to the development of Knowledge Graphs (KG). KGs are networks composed of nodes (with different labels and properties) and relationships (edges of distinct types and with specific properties). As such, a node in a biomedical related KG can have the label gene/protein, disease/phenotype, compound/drug etc. The relationships can have types such as “induces”, “activates”, “cures” etc.. So two proteins could be connected by the relationships Protein A → “activates” Protein B, or a Gene can be related to a disease as Gene 1 → “is overexpressed in” Disease X.
Knowledge graphs are an elegant way to represent complex systems with many heavily interconnected components. In this way, they are very powerful tools in drug discovery. By using a KG we can combine the body of public information with our own experimental data to gain insights about new targets that can be important hubs in a specific disease.
Our DrugDiscovery KG is still under development, but it will comprise all the accumulated understanding on relationships between targets, drugs and diseases. This provides a massive network, with thousands of nodes and millions of edges, that helps us frame precisely all the knowledge displayed in the scientific literature, as well as the knowledge derived from our own research. Through the combination of external and internal data we aim to identify new therapeutic paradigms, which would be difficult to access by traditional scientific methods.
Final words
ZeClinics was born as a purely experimental research company. However, we soon realized that, beside our experimental expertise, the wealth of experimental data acquired during the life of the company was our most valued asset… if it was fully exploited.
We believe that the combination of our experimental data with the use of multiple AI-based tools can have an impact in the advance of drug discovery. Nowadays, we can neither claim to be “only” an experimental company, nor we should call ourselves a digital one. Our activities are multidisciplinary at the intersection of biology, toxicity, data and computer science and—of course—artificial intelligence.
By having experimental and digital competencies, we are able to generate vast amounts of experimental data at the lab and to analyze and integrate them efficiently by using AI tools. This way, we can uncover new biological insights and make better predictions on drug outcomes. Finally, since we have the experimental capacities, we can come back to the lab to test if those hypotheses stay true, when confronted experimentally. This virtuous cycle combines the best of two worlds and, in our view, is the best path to make a mark and discover new therapeutics, which will be more efficacious, safer and will reach patients earlier. Because, at the end, the hows will always matter, but we should never forget about the whys.
By Sylvia Dyballa
Sylvia studied Biochemistry at the University of Tübingen (Germany). She then became interested in cellular systems and how cells behave to form tissues and organisms. So, after graduation she moved to Barcelona to pursue her PhD in Developmental Biology with Cristina Pujades at the UPF. During her PhD she started working with zebrafish, and she used live imaging and cell lineage reconstruction to generate a “digital” zebrafish organ. After her PhD she teamed up with ZeClinics to contribute with her experience in live imaging, image- and data-processing. Today she is Head of Technology and Development in ZeClinics, and her aim is to push innovation and growth in ZeClinics through the use of emerging technologies.