A Machine Learning Approach for Predicting Defluorination of Per- and Polyfluoroalkyl Substances (PFAS) for Their Efficient Treatment and Removal

03 September 2019, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


We present the first application of machine learning on per- and polyfluoroalkyl substances (PFAS) for predicting and rationalizing carbon-fluorine (C–F) bond dissociation energies to aid in their efficient treatment and removal. Using a variety of machine learning algorithms (including Random Forest, Least Absolute Shrinkage and Selection Operator Regression, and Feed-forward Neural Networks), we were able to obtain extremely accurate predictions for C–F bond dissociation energies (with deviations less than 0.70 kcal/mol) that are within chemical accuracy of the PFAS reference data. In addition, we show that our machine learning approach is extremely efficient (requiring less than 10 minutes to train the data and less than a second to predict the C–F bond dissociation energy of a new compound) and only needs knowledge of the simple chemical connectivity in a PFAS structure to yield reliable results – without recourse to a computationally expensive quantum mechanical calculation or a three-dimensional structure. Finally, we present an unsupervised machine learning algorithm that can automatically classify and rationalize chemical trends in PFAS structures that would otherwise have been difficult to humanly visualize/process manually. Collectively, these studies (1) comprise the first application of machine learning techniques for PFAS structures to predict/rationalize C–F bond dissociation energies and (2) show immense promise for assisting experimentalists in the targeted defluorination of specific bonds in PFAS structures (or other unknown environmental contaminants) of increasing complexity.


perfluoroalkyl substances
polyfluoroalkyl substances
Machine Learning
environmental chemistry
Water Treatment
bond dissociation energy
Defluorination reactions
Supervised Machine Learning
Unsupervised Machine Learning

Supplementary materials

supplementary info


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.