Abstract
Very persistent and very mobile (vPvM) substances pose a threat to the environment and human health. These chemicals may persist in aquatic systems, where they can move very easily and quickly due to their affinity for water rather than adsorbents such as soil. Currently, the partition coefficient between organic carbon and water Koc is used to classify chemicals as very mobile, mobile or non-mobile. However, the lack of experimental log Koc data for most chemicals presents a major limitation. With thousands of new chemicals entering the market---and therefore our exposome---every year, there is a growing need for advanced cheminformatics tools to prioritise such chemicals of concern. Since reverse-phase liquid chromatography (RPLC) experimental data are much more abundantly available, they were used as a marker for environmental mobility. The organic modifier fraction at elution for each chemical was used to assign mobility labels to the 146,902 chemicals used from an RPLC dataset. To relate the structure of chemicals to their mobility, the 881 PubChem fingerprints were computed for each chemical. A random forest classification model was then developed to predict the mobility of chemicals based on their retention behaviour and implicit molecular structure from fingerprints. The model resulted in an accuracy (i.e. F1 score) of 0.87, 0.81, and 0.96 for very mobile, mobile, and non-mobile classes, respectively, in the test set. It was then applied to all REACH registered chemicals (n = 64,498). The model classified 20% of the registry as very mobile, 26% as mobile and 53% as non-mobile. Using the OPERA predicted log Koc for the registry resulted in 31% being classified as very mobile, 31% as mobile and 38% as non-mobile. Previous studies have only been able to assign the estimated mobility to around 20% of REACH.
Supplementary materials
Title
SI - Machine_learning_for_predicting_environmental_mobility_based_on_retention_behaviour
Description
Supplementary figures and information for: Machine learning for predicting environmental mobility based on retention behaviour
Actions