Abstract
Genotoxicity test data is required for pesticidal and biocidal active substances prior to regulatory approval, while for their metabolites and impurities, in silico predictions are often accepted. Nonetheless, the extent to which these compounds are represented in publicly available genotoxicity databases remains unclear. Herein, we utilize chemical space methods to define the overlap between pesticide substances and activity data for six genotoxicity test types commonly employed in regulatory toxicology: the Ames test, the in vitro mammalian cell gene mutation test, the in vitro micronucleus test, the in vitro chromosomal aberration test, the in vivo micronucleus test, and the in vivo chromosomal aberration test. After merging and performing structure standardization on 18 pesticide/biocide databases, we identified 4,932 unique substances. Within 16 public genotoxicity databases, 10,020 substances had at least one data point in one of the genotoxicity tests. The chemical space overlap between the pesticide substances and each genotoxicity set was evaluated by calculating physicochemical descriptors and molecular fingerprints, which were visualized using PCA and UMAP, respectively. The chemical space of pesticide substances was well represented by Ames test data and, to varying degrees, by the other genotoxicity tests, with particularly low coverage for in vivo chromosomal aberration. The major scaffolds found in pesticide substances appeared in all genotoxicity data sets. Functional groups overrepresented in the genotoxicity data compared to pesticide substances were indicative of prototypical genotoxic substances, while those that were underrepresented included motifs potentially associated with pesticides (e.g., halogens). Chemical space methods can assist regulatory toxicologists in understanding regions of pesticide substance chemical space that are well or poorly characterized by genotoxicity data. This understanding is important for the accurate and targeted use of databases and data-based non-testing methods such as QSAR and read-across in line with regulatory requirements.
Supplementary materials
Title
Supporting information file
Description
Table S1: Substance and Scaffold summary information for the DrugBank data; Table S2: Genotoxicity data set overlap with pesticide substances and DrugBank by Tanimoto similarity; Table S3: Endpoint data set overlap with pesticide substances and DrugBank by Tanimoto similarity
Actions
Title
Table S4
Description
Fraction of substances with each functional group in the processed endpoint data sets
Actions
Title
Table S5
Description
Count and percent of substances with each Murcko scaffold in the endpoint data sets
Actions