Abstract
In this study, count-based Morgan fingerprints (CMF) was used to represent the fundamental chemical structures of contaminants, and a neural network model (R²=0.76) was developed to predict acute fish toxicity (AFT) of organic compounds, which surpassed previous models. We found the limitations of in distinguishing homologous compounds may account for the suboptimal performance of binary fingerprints. The principles of generation and collision of CMF was explored and an improved method based on Tanimoto distance was introduced to calculated molecular similarity represented by CMFs as well. Toxic substructures identified by Shapley additive explanation (SHAP) method were substituted benzenes, long carbon chains, unsaturated carbons and halogen atoms. By incorporating KOW and monitoring shifts in feature importance, the influence of substructures on AFT was further delineated, revealing their roles in facilitating exposure and reactive toxicity. On this basis, we compared the toxicity of similar substructures and the same substructure in different chemical environments. To overcome the limitation of SHAP analysis, this study proposed a new method, toxicity index (TI), to identify substructures that were present in small quantities but highly toxic. With TI, we identified several important substructures, such as parathion and polycyclic substituents. We found that the toxicity of large substructures may be misestimated in the previous studies.
Supplementary materials
Title
SI
Description
SI texts and figures
Actions
Title
SI
Description
SI tables and datasets
Actions