Denoising Drug Discovery Data for Improved ADMET Property Prediction

22 April 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Predicting ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of small molecules is a key task in drug discovery. A major challenge in building better ADMET models is the experimental error inherent in the data. Furthermore, ADMET predictors are typically regression tasks due to the continuous nature of the data. This makes it difficult to apply existing methods as most focus on classification tasks. Here, we develop denoising schemes based on deep learning to address this. We find that the training error can be used to identify the noise in regression tasks while ensemble-based and forgotten event-based metrics fail to detect the noise. The most significant performance increase occurs when the original model is finetuned with the denoised data using training error as the noise detection metric. Our method has the ability to improve models with medium noise and does not degrade the performance of models with noise outside this range. To our knowledge, our denoising scheme is the first to improve model performance for ADMET data and has implications for improving models for experimental assay data in general.


Noise filter
Noise reduction
ADMET prediction
Drug Discovery
Deep Learning
Denoising for Regression

Supplementary materials

Supporting Information
Additional noise detection, adaptive threshold determination, QM9 result, sample imbalance, dataset size effects, and noise effects on multitask models.


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.
Comment number 1, Rajarshi Guha: Jun 02, 2024, 14:40

Hi, interesting paper and nice results. I noticed a minor typo - on page 16 of the PDF you write "Walter et el. defined the structure activity landscape index (SALI) as shown in Equation 3" and ref 42 is cited. In fact SALI was original defined by Guha & Van Drie in