The multiclass ARKA framework for developing improved quantitative RASAR models

13 January 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Over the last decade, the field of Cheminformatics has witnessed a significant advancement in developing more robust and predictive quantitative structure-activity relationship (QSAR) models. Recently, the quantitative Read-Across Structure-Activity Relationship (q-RASAR) modeling has been reported to enhance the external predictivity of QSAR models. However, in some studies, the cross-validation metrics of the q-RASAR models show compromised values compared to the corresponding QSAR models. In this background, we have reported here an improved q-RASAR workflow coupled with the Arithmetic Residuals in K-groups Analysis (ARKA) framework. This improved workflow (ARKA-RASAR) considers two important aspects – the contribution of different QSAR descriptors to different experimental response ranges, and the identification of the similarity among close congeners based on both the selected QSAR descriptors and the contribution of different QSAR descriptors to different experimental response ranges. In this study, five different toxicity datasets that had been previously used for the development of QSAR and q-RASAR models were considered. We have developed hybrid ARKA models (consisting of a combination of QSAR descriptors and ARKA descriptors). These hybrid feature spaces were used to compute RASAR descriptors and develop ARKA-RASAR models. We have used the same modeling strategies (Partial Least Squares and Multiple Linear Regression) used to develop the previously reported QSAR and q-RASAR models for a fair comparison. In addition, these modeling algorithms are simple, reproducible, and transferable. The multi-criteria decision-making statistical approach, the Sum of Ranking Differences (SRD), inferred that the ARKA-RASAR models are the best-performing models, considering training, test, and cross-validation statistics. Two-way ANOVA was performed to show that the SRD values for the models from selected datasets and the four different modeling algorithms are significantly different. The least significant difference procedure ensures that the SRD values were significantly different for most models, presenting an unbiased workflow. A simple, free, and user-friendly Java-based tool – Multiclass ARKA-v1.0- has been developed to quickly and efficiently compute multiple ARKA descriptors based on the user's choice. The promising results and the ease of computation of ARKA and RASAR descriptors using our tools suggest that the ARKA-RASAR modeling framework may be a potential choice for developing highly robust and predictive models.

Keywords

QSAR
q-RASAR
ARKA
ARKA-RASAR
Hybrid ARKA
Sum of ranking differences
Toxicity

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials
Description
The source data used to develop the models reported in this paper are available in Supplementary Materials SI-1. The step-by-step manual computation of the ARKA descriptors for all five datasets has been presented in Supplementary Materials SI-2 to SI-6. The training and test sets for the hybrid ARKA and ARKA-RASAR models for all five different datasets have been presented in Supplementary Materials SI-7 and SI-8, respectively.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.