Multitask Bioactivity Predictions Using Structural Chemical and Cell Morphology Information

The understanding of the Mechanism-of-Action (MoA) of compounds and the prediction of potential drug targets has an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed by using bioactivity data from the ExCAPE database, image data from the Cell Painting data set (the largest publicly available data set of cell images with approximately ~30,000 compound perturbations) and Extended Connectivity Fingerprints (ECFPs) using the multitask Bayesian Matrix Factorisation (BMF) approach Macau. We found that the BMF Macau and Random Forest (RF) performance was overall similar when ECFP fingerprints were used as compounds descriptors. However, BMF Macau outperformed RF in 155 out of 224 target classes (69.20%) when image data was used as compounds information. By using BMF Macau 100 (corresponding to about 45%) and 90 ( about 40%) of the 224 targets were predicted with high predictive performance (AUC > 0.8) with ECFP data and image data as side information, respectively. There were targets better predicted by image data as side information, such as b-catenin, and others better predicted by fingerprint-based side information, like proteins belonging to the G-Protein Coupled Receptor 1 family, which could be rationalized from the underlying data distributions in each descriptor domain. In conclusion, both cell morphology changes and structural chemical information contain information about compound bioactivity, which is also partially complementary, and can hence contribute to in silico mechanism of action analysis.