These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Manuscript_TrapotsiMA_ChemRxiv.pdf (5.42 MB)

Multitask Bioactivity Predictions Using Structural Chemical and Cell Morphology Information

submitted on 26.06.2020, 12:04 and posted on 29.06.2020, 11:14 by Maria-Anna Trapotsi, Ian Barrett, Lewis Mervin, Avid M. Afzal, Noé Sturm, Ola Engkvist, Andreas Bender

The understanding of the Mechanism-of-Action (MoA) of compounds and the prediction of potential drug targets has an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed by using bioactivity data from the ExCAPE database, image data from the Cell Painting data set (the largest publicly available data set of cell images with approximately ~30,000 compound perturbations) and Extended Connectivity Fingerprints (ECFPs) using the multitask Bayesian Matrix Factorisation (BMF) approach Macau. We found that the BMF Macau and Random Forest (RF) performance was overall similar when ECFP fingerprints were used as compounds descriptors. However, BMF Macau outperformed RF in 155 out of 224 target classes (69.20%) when image data was used as compounds information. By using BMF Macau 100 (corresponding to about 45%) and 90 ( about 40%) of the 224 targets were predicted with high predictive performance (AUC > 0.8) with ECFP data and image data as side information, respectively. There were targets better predicted by image data as side information, such as b-catenin, and others better predicted by fingerprint-based side information, like proteins belonging to the G-Protein Coupled Receptor 1 family, which could be rationalized from the underlying data distributions in each descriptor domain. In conclusion, both cell morphology changes and structural chemical information contain information about compound bioactivity, which is also partially complementary, and can hence contribute to in silico mechanism of action analysis.


Using Heterogeneous Information Sources for Understanding the Mode of Action of Compounds

Biotechnology and Biological Sciences Research Council

Find out more...


Email Address of Submitting Author


University of Cambridge



ORCID For Submitting Author


Declaration of Conflict of Interest

None declared

Version Notes

Version 1 of the manuscript