Machine learning for predicting environmental mobility based on retention behaviour

20 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Very persistent and very mobile (vPvM) substances pose a threat to the environment and human health. These chemicals may persist in aquatic systems, where they can move very easily and quickly due to their affinity for water rather than adsorbents such as soil. Currently, the partition coefficient between organic carbon and water Koc is used to classify chemicals as very mobile, mobile or non-mobile. However, the lack of experimental log Koc data for most chemicals presents a major limitation. With thousands of new chemicals entering the market---and therefore our exposome---every year, there is a growing need for advanced cheminformatics tools to prioritise such chemicals of concern. Since reverse-phase liquid chromatography (RPLC) experimental data are much more abundantly available, they were used as a marker for environmental mobility. The organic modifier fraction at elution for each chemical was used to assign mobility labels to the 146,902 chemicals used from an RPLC dataset. To relate the structure of chemicals to their mobility, the 881 PubChem fingerprints were computed for each chemical. A random forest classification model was then developed to predict the mobility of chemicals based on their retention behaviour and implicit molecular structure from fingerprints. The model resulted in an accuracy (i.e. F1 score) of 0.87, 0.81, and 0.96 for very mobile, mobile, and non-mobile classes, respectively, in the test set. It was then applied to all REACH registered chemicals (n = 64,498). The model classified 20% of the registry as very mobile, 26% as mobile and 53% as non-mobile. Using the OPERA predicted log Koc for the registry resulted in 31% being classified as very mobile, 31% as mobile and 38% as non-mobile. Previous studies have only been able to assign the estimated mobility to around 20% of REACH.

Keywords

Machine Learning
vPvM
Environmental Fate
Chromatography
Data Science
Hazard classification

Supplementary materials

Title
Description
Actions
Title
SI - Machine_learning_for_predicting_environmental_mobility_based_on_retention_behaviour
Description
Supplementary figures and information for: Machine learning for predicting environmental mobility based on retention behaviour
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.