The multiclass ARKA framework for developing improved quantitative RASAR models

Arkaprava  Banerjee; Kunal Roy

doi:10.26434/chemrxiv-2025-5qsh2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

The multiclass ARKA framework for developing improved quantitative RASAR models

13 January 2025, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Over the last decade, the field of Cheminformatics has witnessed a significant advancement in developing more robust and predictive quantitative structure-activity relationship (QSAR) models. Recently, the quantitative Read-Across Structure-Activity Relationship (q-RASAR) modeling has been reported to enhance the external predictivity of QSAR models. However, in some studies, the cross-validation metrics of the q-RASAR models show compromised values compared to the corresponding QSAR models. In this background, we have reported here an improved q-RASAR workflow coupled with the Arithmetic Residuals in K-groups Analysis (ARKA) framework. This improved workflow (ARKA-RASAR) considers two important aspects – the contribution of different QSAR descriptors to different experimental response ranges, and the identification of the similarity among close congeners based on both the selected QSAR descriptors and the contribution of different QSAR descriptors to different experimental response ranges. In this study, five different toxicity datasets that had been previously used for the development of QSAR and q-RASAR models were considered. We have developed hybrid ARKA models (consisting of a combination of QSAR descriptors and ARKA descriptors). These hybrid feature spaces were used to compute RASAR descriptors and develop ARKA-RASAR models. We have used the same modeling strategies (Partial Least Squares and Multiple Linear Regression) used to develop the previously reported QSAR and q-RASAR models for a fair comparison. In addition, these modeling algorithms are simple, reproducible, and transferable. The multi-criteria decision-making statistical approach, the Sum of Ranking Differences (SRD), inferred that the ARKA-RASAR models are the best-performing models, considering training, test, and cross-validation statistics. Two-way ANOVA was performed to show that the SRD values for the models from selected datasets and the four different modeling algorithms are significantly different. The least significant difference procedure ensures that the SRD values were significantly different for most models, presenting an unbiased workflow. A simple, free, and user-friendly Java-based tool – Multiclass ARKA-v1.0- has been developed to quickly and efficiently compute multiple ARKA descriptors based on the user's choice. The promising results and the ease of computation of ARKA and RASAR descriptors using our tools suggest that the ARKA-RASAR modeling framework may be a potential choice for developing highly robust and predictive models.

Keywords

Sum of ranking differences

Toxicity

Supplementary materials

Title

Description

Actions

Title

Supplementary Materials

Description

The source data used to develop the models reported in this paper are available in Supplementary Materials SI-1. The step-by-step manual computation of the ARKA descriptors for all five datasets has been presented in Supplementary Materials SI-2 to SI-6. The training and test sets for the hybrid ARKA and ARKA-RASAR models for all five different datasets have been presented in Supplementary Materials SI-7 and SI-8, respectively.

Actions

Supplementary weblinks

Title

Description

Actions

Title

RASAR Descriptor Calculator

Description

The software tools for the computation of the RASAR descriptors can be accessed from this link.

Actions

View

Title

Multi-Class ARKA

Description

The software tools for the computation of the multiclass ARKA descriptors can be accessed from this link.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

The multiclass ARKA framework for developing improved q-RASAR models for environmental toxicity endpoints

Arkaprava Banerjee, Kunal Roy journal article

Environmental Science: Processes & Impacts

Online publication date: 2025

Version History

Jan 13, 2025 Version 1

Metrics

1,526

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-5qsh2

Funding

Life Sciences Research Board

LSRB/01/15001/M/LSRB-394/SH&DD/2022

Indian Council of Medical Research

BMI/12(73)/2022,

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

The multiclass ARKA framework for developing improved quantitative RASAR models

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share