Prediction of acute toxicity of organic contaminants to fish: model development and a novel approach to identify reactive substructures

02 January 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

In this study, count-based Morgan fingerprints (CMF) was used to represent the fundamental chemical structures of contaminants, and a neural network model (R²=0.76) was developed to predict acute fish toxicity (AFT) of organic compounds, which surpassed previous models. We found the limitations of in distinguishing homologous compounds may account for the suboptimal performance of binary fingerprints. The principles of generation and collision of CMF was explored and an improved method based on Tanimoto distance was introduced to calculated molecular similarity represented by CMFs as well. Toxic substructures identified by Shapley additive explanation (SHAP) method were substituted benzenes, long carbon chains, unsaturated carbons and halogen atoms. By incorporating KOW and monitoring shifts in feature importance, the influence of substructures on AFT was further delineated, revealing their roles in facilitating exposure and reactive toxicity. On this basis, we compared the toxicity of similar substructures and the same substructure in different chemical environments. To overcome the limitation of SHAP analysis, this study proposed a new method, toxicity index (TI), to identify substructures that were present in small quantities but highly toxic. With TI, we identified several important substructures, such as parathion and polycyclic substituents. We found that the toxicity of large substructures may be misestimated in the previous studies.

Keywords

LC50
QSAR
count-based Morgan fingerprint
machine learning
acute fish toxicity

Supplementary materials

Title
Description
Actions
Title
SI
Description
SI texts and figures
Actions
Title
SI
Description
SI tables and datasets
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.