Comparing Massively-Multitask Regression Algorithms for Drug Discovery

14 March 2025, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Massively-Multitask Regression Models (MMRMs) trained on millions of compounds and many thousands of assays can predict bioactivity with accuracy comparable to 4-concentration IC50 experiments. Recent advances in hardware and algorithms have produced a variety of methods for multitask modeling. This report compares the performance of six MMRM algorithms: Profile-QSAR (pQSAR), Alchemite, a meta learner (MetaNN), a multitask feed-forward neural network (MT-DNN), Bayesian factorization with side information (Macau) and Inductive Matrix Completion (IMC). To ensure a fair comparison, each was trained by an expert, in several cases an author, of each method. All used the same sets of 159 kinase and 4276 diverse ChEMBL assays, employing the same realistically novel training/test set splits. MMRMs generally performed much better than a benchmark of single-task random forest regression models for our use case of virtually screening the compound collection on which the models were trained. The comparison was complicated, because methods that train all models simultaneously must leave out the test-set measurements for all assays to avoid test-set leakage, here 75% of measurements. MMRMs which train models one-at-a-time need only leave out data for each assay as it is trained, training on 99+% of data. This does not affect the accuracy of the final production models trained on 100% of data but does affect evaluation of how the final models will perform. The comparisons, therefore, included 3 training/test set collections: “all-out” models that leave out all test sets during training; “one-out” models where practical; and “subset-out” models, which only built models for about 10% of kinase assays or 1% of diverse assays, but could thus train evaluation models on about 90% or 99% of the measurements respectively. Many methods achieve similar accuracy. However, models trained on only 75% of the data performed much worse than those trained on 99+%. This indicates that all-out models seriously underestimate the performance of the final production models. Subset-out models were closer to one-out. A compromise method is to assess performance of the final models by multiple subset-out models, a more practicable computation for 1000s of assays. MMRMs demonstrated little advantage over single-task models for “cold-start” predictions on our novel test-set compounds not only unlike the specific assay’s training set, but also never tested on any of the other multitask supporting assays. Instead, the accuracy advantage was mainly from imputations within these sparse assay collections, compounds unlike the training set for the assay of interest, but with some measurements on other assays. This implies that MMRMs are best suited for hit-finding, off-target, promiscuity, mechanism-of-action, polypharmacology or drug repurposing prediction of compounds from the source used to train the overall multitask model. They have little advantage over single-task models, at much higher cost, for virtual screening of vendor archives or exploratory generative chemistry. Given that accuracy of the final models is often comparable between several of the algorithms, the paper concludes with a detailed discussion of other practical pros and cons of each method that might help choose which method to employ.

Keywords

drug discovery
multitask regression
virtual screening
model evaluation
imputation

Supplementary materials

Title
Description
Actions
Title
Supplementary figures, tables and text
Description
Figure S1: Plots of signed-r2 vs. assays (rank-ordered separately) for each MMRM algorithm, for each assay collection. Figures S2-S7: For each assay collection, trellis plots showing high correlation of signed-r2 for individual assays between each pair of algorithms. Figure S8: Signed-r2 vs. assays (rank-ordered separately) for each algorithm comparing imputations and cold-start predictions. Supplemental Text: MetaNN algorithm details
Actions
Title
Details of the analysis comparing signed-r2 between models
Description
“wilcoxen_pvalue” is the p-value before multiple hypothesis testing correction. “mean1-0”/”median1-0” are the mean/median of the signed-r2 of model 1 minus the signed-r2 of model 0. “pvalue_corrected” is the multiple-hypothesis testing corrected p-value. “is_diff” is whether “pvalue_corrected” is greater than 0.05.
Actions
Title
median signed-r2 for 6 assay collections by 2 to 7 algorithms
Description
Comparison of predicted vs observed signed-r2 for "realistic" test sets across all assays on kinase and diverse assay collections.
Actions
Title
count of assays with signed-r2 > 0.3 for 6 assay collections by 2 to 7 algorithms
Description
Number of assays with signed-r2 > 0.3 for "realistic" test set predictions across all assays on kinase and diverse assay collections.
Actions
Title
Details of the analysis comparing statistical significance of “successful” models with signed-r2 > 0.3 between algorithms
Description
“mcnemar_pvalue” is the p-value before multiple hypothesis testing correction. “pvalue_corrected” is the multiple-hypothesis testing corrected p-value. “is_diff” is whether “pvalue_corrected” is greater than 0.05
Actions
Title
Details of statical significance of rate of “successful” models with signed-r2 > 0.3
Description
2x2 contingency tables underlying the McNemar tests for whether there is a different rate of “successful” models with signed-r2 > 0.3
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.