Comparing Massively-Multitask Regression Algorithms for Drug Discovery

Eric Martin; Xiang-Wei Zhu; Patrick  Riley; Steven Kearnes; Ekaterina A  Sosnina; Li Tian; Zijian  Wang; Ying Wei; Thomas M Whitehead; Gareth J Conduit; Matthew D Segall

doi:10.26434/chemrxiv-2025-2mrbb-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Comparing Massively-Multitask Regression Algorithms for Drug Discovery

14 March 2025, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Massively-Multitask Regression Models (MMRMs) trained on millions of compounds and many thousands of assays can predict bioactivity with accuracy comparable to 4-concentration IC50 experiments. Recent advances in hardware and algorithms have produced a variety of methods for multitask modeling. This report compares the performance of six MMRM algorithms: Profile-QSAR (pQSAR), Alchemite, a meta learner (MetaNN), a multitask feed-forward neural network (MT-DNN), Bayesian factorization with side information (Macau) and Inductive Matrix Completion (IMC). To ensure a fair comparison, each was trained by an expert, in several cases an author, of each method. All used the same sets of 159 kinase and 4276 diverse ChEMBL assays, employing the same realistically novel training/test set splits. MMRMs generally performed much better than a benchmark of single-task random forest regression models for our use case of virtually screening the compound collection on which the models were trained. The comparison was complicated, because methods that train all models simultaneously must leave out the test-set measurements for all assays to avoid test-set leakage, here 75% of measurements. MMRMs which train models one-at-a-time need only leave out data for each assay as it is trained, training on 99+% of data. This does not affect the accuracy of the final production models trained on 100% of data but does affect evaluation of how the final models will perform. The comparisons, therefore, included 3 training/test set collections: “all-out” models that leave out all test sets during training; “one-out” models where practical; and “subset-out” models, which only built models for about 10% of kinase assays or 1% of diverse assays, but could thus train evaluation models on about 90% or 99% of the measurements respectively. Many methods achieve similar accuracy. However, models trained on only 75% of the data performed much worse than those trained on 99+%. This indicates that all-out models seriously underestimate the performance of the final production models. Subset-out models were closer to one-out. A compromise method is to assess performance of the final models by multiple subset-out models, a more practicable computation for 1000s of assays. MMRMs demonstrated little advantage over single-task models for “cold-start” predictions on our novel test-set compounds not only unlike the specific assay’s training set, but also never tested on any of the other multitask supporting assays. Instead, the accuracy advantage was mainly from imputations within these sparse assay collections, compounds unlike the training set for the assay of interest, but with some measurements on other assays. This implies that MMRMs are best suited for hit-finding, off-target, promiscuity, mechanism-of-action, polypharmacology or drug repurposing prediction of compounds from the source used to train the overall multitask model. They have little advantage over single-task models, at much higher cost, for virtual screening of vendor archives or exploratory generative chemistry. Given that accuracy of the final models is often comparable between several of the algorithms, the paper concludes with a detailed discussion of other practical pros and cons of each method that might help choose which method to employ.

Keywords

Supplementary materials

Title

Description

Actions

Title

Supplementary figures, tables and text

Description

Figure S1: Plots of signed-r2 vs. assays (rank-ordered separately) for each MMRM algorithm, for each assay collection. Figures S2-S7: For each assay collection, trellis plots showing high correlation of signed-r2 for individual assays between each pair of algorithms. Figure S8: Signed-r2 vs. assays (rank-ordered separately) for each algorithm comparing imputations and cold-start predictions. Supplemental Text: MetaNN algorithm details

Actions

Title

Details of the analysis comparing signed-r2 between models

Description

“wilcoxen_pvalue” is the p-value before multiple hypothesis testing correction. “mean1-0”/”median1-0” are the mean/median of the signed-r2 of model 1 minus the signed-r2 of model 0. “pvalue_corrected” is the multiple-hypothesis testing corrected p-value. “is_diff” is whether “pvalue_corrected” is greater than 0.05.

Actions

Title

median signed-r2 for 6 assay collections by 2 to 7 algorithms

Description

Comparison of predicted vs observed signed-r2 for "realistic" test sets across all assays on kinase and diverse assay collections.

Actions

Title

count of assays with signed-r2 > 0.3 for 6 assay collections by 2 to 7 algorithms

Description

Number of assays with signed-r2 > 0.3 for "realistic" test set predictions across all assays on kinase and diverse assay collections.

Actions

Title

Details of the analysis comparing statistical significance of “successful” models with signed-r2 > 0.3 between algorithms

Description

“mcnemar_pvalue” is the p-value before multiple hypothesis testing correction. “pvalue_corrected” is the multiple-hypothesis testing corrected p-value. “is_diff” is whether “pvalue_corrected” is greater than 0.05

Actions

Title

Details of statical significance of rate of “successful” models with signed-r2 > 0.3

Description

2x2 contingency tables underlying the McNemar tests for whether there is a different rate of “successful” models with signed-r2 > 0.3

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 14, 2025 Version 2

Mar 12, 2025 Version 1

Version Notes

Added title, authors and affiliations to Supplemental materials.

Metrics

616

208

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-2mrbb-v2

Funding

The Government of Hong Kong Special Administrative Region of the People's Republic of China

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Comparing Massively-Multitask Regression Algorithms for Drug Discovery

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share