Machine learning-assisted c-RASAR modeling of a curated set of orally active nephrotoxic drugs: Similarity-based predictions from close source neighbors

Arkaprava Banerjee; Kunal Roy

doi:10.26434/chemrxiv-2024-57klw

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Machine learning-assisted c-RASAR modeling of a curated set of orally active nephrotoxic drugs: Similarity-based predictions from close source neighbors

22 August 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Cheminformatics and Machine Learning (ML) have seen exponential progress in the last decade, in the field of chemical risk assessment, due to their efficiency, accuracy, and reliability. The constant evolution of New Approach Methodologies (NAM) has inspired researchers around the globe to deviate from conventional approaches and adopt or develop new, “unconventional” methods. The classification Read-Across Structure-Activity Relationship (c-RASAR) is an unconventional approach that utilizes similarity and error-based information from the nearest neighboring compounds into a Machine Learning modeling framework, resulting in enhanced predictivity. Although this technique has so far been applied to molecular descriptors, we have applied this approach in the present study on molecular fingerprints along with conventional molecular descriptors for ML-based model development from a recently reported highly curated set of orally active nephrotoxic drugs. We initially developed ML models using nine different linear and non-linear algorithms separately on molecular descriptors and MACCS fingerprints, thus generating 18 different ML QSAR models. Using the chemical spaces defined by the modeling descriptors and fingerprints, the similarity and error-based RASAR descriptors were computed, and the most discriminating RASAR descriptors were used to develop another set of 18 different ML c-RASAR models. All 36 models were cross-validated 20 times with a 5-fold cross-validation strategy, and their predictivity was checked on the test set data. A multi-criteria decision-making strategy – the Sum of Ranking Differences (SRD) approach - was adopted to identify the best-performing model based on robustness and external validation parameters. This statistical analysis suggested that the c-RASAR models had an overall good performance, while the best-performing model was also a c-RASAR model. This model was used to screen a true external set data prepared from the known nephrotoxic compounds of DrugBankDB. These results also showed that our model efficiently identifies nephrotoxic compounds. The t-SNE analyses on the descriptors, fingerprints, and the RASAR descriptor spaces inferred that the RASAR descriptors efficiently encode the chemical information, as evident from the tight and distinct clustering of the data points. Additionally, the molecular descriptors and the corresponding RASAR descriptors were used to identify potential activity cliffs using the ARKA framework.

Keywords

c-RASAR

Machine learning

Sum of Ranking Differences (SRD)

Nephrotoxicity

ARKA

t-SNE

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Supplementary Information SI-1 contains the data set, computed descriptors for training and test sets, and prediction results for the true external set. Supplementary Information SI-2 contains the list of RASAR descriptors.

Actions

Supplementary weblinks

Title

Description

Actions

Title

DTC Lab Software Supplementary Site

Description

The software tools used for the Read-Across predictions and the computation of the RASAR descriptors and ARKA descriptors are freely available.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs

Arkaprava Banerjee, Kunal Roy journal article

Scientific Reports , Volume 15, Issue 1

Online publication date: Jan 04, 2025

Version History

Aug 22, 2024 Version 1

Metrics

453

142

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-57klw

Funding

Life Science Research Board, DRDO, India

LSRB/01/15001/M/LSRB-394/SH&DD/2022

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Machine learning-assisted c-RASAR modeling of a curated set of orally active nephrotoxic drugs: Similarity-based predictions from close source neighbors

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share