Abstract
Over the last decade, the field of Cheminformatics has witnessed a significant advancement in developing more robust and predictive quantitative structure-activity relationship (QSAR) models. Recently, the quantitative Read-Across Structure-Activity Relationship (q-RASAR) modeling has been reported to enhance the external predictivity of QSAR models. However, in some studies, the cross-validation metrics of the q-RASAR models show compromised values compared to the corresponding QSAR models. In this background, we have reported here an improved q-RASAR workflow coupled with the Arithmetic Residuals in K-groups Analysis (ARKA) framework. This improved workflow (ARKA-RASAR) considers two important aspects – the contribution of different QSAR descriptors to different experimental response ranges, and the identification of the similarity among close congeners based on both the selected QSAR descriptors and the contribution of different QSAR descriptors to different experimental response ranges. In this study, five different toxicity datasets that had been previously used for the development of QSAR and q-RASAR models were considered. We have developed hybrid ARKA models (consisting of a combination of QSAR descriptors and ARKA descriptors). These hybrid feature spaces were used to compute RASAR descriptors and develop ARKA-RASAR models. We have used the same modeling strategies (Partial Least Squares and Multiple Linear Regression) used to develop the previously reported QSAR and q-RASAR models for a fair comparison. In addition, these modeling algorithms are simple, reproducible, and transferable. The multi-criteria decision-making statistical approach, the Sum of Ranking Differences (SRD), inferred that the ARKA-RASAR models are the best-performing models, considering training, test, and cross-validation statistics. Two-way ANOVA was performed to show that the SRD values for the models from selected datasets and the four different modeling algorithms are significantly different. The least significant difference procedure ensures that the SRD values were significantly different for most models, presenting an unbiased workflow. A simple, free, and user-friendly Java-based tool – Multiclass ARKA-v1.0- has been developed to quickly and efficiently compute multiple ARKA descriptors based on the user's choice. The promising results and the ease of computation of ARKA and RASAR descriptors using our tools suggest that the ARKA-RASAR modeling framework may be a potential choice for developing highly robust and predictive models.
Supplementary materials
Title
Supplementary Materials
Description
The source data used to develop the models reported in this paper are available in Supplementary Materials SI-1. The step-by-step manual computation of the ARKA descriptors for all five datasets has been presented in Supplementary Materials SI-2 to SI-6. The training and test sets for the hybrid ARKA and ARKA-RASAR models for all five different datasets have been presented in Supplementary Materials SI-7 and SI-8, respectively.
Actions
Supplementary weblinks
Title
RASAR Descriptor Calculator
Description
The software tools for the computation of the RASAR descriptors can be accessed from this link.
Actions
View Title
Multi-Class ARKA
Description
The software tools for the computation of the multiclass ARKA descriptors can be accessed from this link.
Actions
View