Comparative Chemical Space Analysis of Pesticides and Substances with Genotoxicity Data

21 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Genotoxicity test data is required for pesticidal and biocidal active substances prior to regulatory approval, while for their metabolites and impurities, in silico predictions are often accepted. Nonetheless, the extent to which these compounds are represented in publicly available genotoxicity databases remains unclear. Herein, we utilize chemical space methods to define the overlap between pesticide substances and activity data for six genotoxicity test types commonly employed in regulatory toxicology: the Ames test, the in vitro mammalian cell gene mutation test, the in vitro micronucleus test, the in vitro chromosomal aberration test, the in vivo micronucleus test, and the in vivo chromosomal aberration test. After merging and performing structure standardization on 18 pesticide/biocide databases, we identified 4,932 unique substances. Within 16 public genotoxicity databases, 10,020 substances had at least one data point in one of the genotoxicity tests. The chemical space overlap between the pesticide substances and each genotoxicity set was evaluated by calculating physicochemical descriptors and molecular fingerprints, which were visualized using PCA and UMAP, respectively. The chemical space of pesticide substances was well represented by Ames test data and, to varying degrees, by the other genotoxicity tests, with particularly low coverage for in vivo chromosomal aberration. The major scaffolds found in pesticide substances appeared in all genotoxicity data sets. Functional groups overrepresented in the genotoxicity data compared to pesticide substances were indicative of prototypical genotoxic substances, while those that were underrepresented included motifs potentially associated with pesticides (e.g., halogens). Chemical space methods can assist regulatory toxicologists in understanding regions of pesticide substance chemical space that are well or poorly characterized by genotoxicity data. This understanding is important for the accurate and targeted use of databases and data-based non-testing methods such as QSAR and read-across in line with regulatory requirements.

Keywords

Genotoxicity
Data sets
Chemical space analysis
Pesticides
Regulatory toxicology

Supplementary materials

Title
Description
Actions
Title
Supporting information file
Description
Table S1: Substance and Scaffold summary information for the DrugBank data; Table S2: Genotoxicity data set overlap with pesticide substances and DrugBank by Tanimoto similarity; Table S3: Endpoint data set overlap with pesticide substances and DrugBank by Tanimoto similarity
Actions
Title
Table S4
Description
Fraction of substances with each functional group in the processed endpoint data sets
Actions
Title
Table S5
Description
Count and percent of substances with each Murcko scaffold in the endpoint data sets
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.