Abstract
The advancements in the field of cheminformatics have led to a reduction in animal testing to estimate the activity/property/toxicity of query chemicals. Read-Across Structure-Activity Relationship (RASAR) is an emerging concept that utilizes various similarity functions derived from chemical information to develop highly predictive models. Unlike quantitative structure-activity relationship (QSAR) models, RASAR descriptors of a query compound are computed from its close congeners instead of the compound itself, thus targeting predictions in the model training phase. The objective of the present study is not to propose new QSAR models for skin sensitization, but to demonstrate the enhancement in the quality of predictions of the skin-sensitizing potential of organic compounds by developing classification-based RASAR (c-RASAR) models. A diverse, previously curated, dataset was collected from the literature, for which 2D descriptors were computed. The extracted essential features were then used to develop a classification-based linear discriminant analysis (LDA) QSAR model. Furthermore, from the Read-Across-based predictions, RASAR descriptors were calculated using the basic settings of the hyperparameters for the Laplacian kernel-based optimum similarity measure. After feature selection, an LDA c-RASAR model was developed which superseded the prediction quality of the LDA-QSAR model. Various other combinations of RASAR descriptors were also taken to develop additional c-RASAR models all showing better prediction quality than the LDA QSAR model while using a lower number of descriptors. Various other machine learning c-RASAR models were also developed for comparison purposes. In this work, we have proposed and analyzed three new similarity metrics: gm_class, sm1, and sm2. The first one is an indicator variable used to generate a simple univariate c-RASAR model with good prediction ability, while the rest two are similarity indices used to analyze possible activity cliffs in the training and test sets and are believed to play an important role in the modelability analysis of datasets.
Supplementary materials
Title
Supplementary Materials SI-1 and SI-2
Description
SI-1 contains raw data used for the modeling analysis in Excel format.
SI-2 is a Word file with Supplementary Tables
Actions