Abstract
Polyparameter linear free energy relationships (PP-LFERs) are accurate and robust models to predict equilibrium partition coefficients (K) of organic chemicals. The accuracy of predictions by a PP-LEFR depends on the composition of the respective calibration data set. It is generally expected that extrapolation outside the model calibration domain is less accurate than interpolation. In this study, the applicability domain (AD) of PP-LFERs is systematically evaluated by calculation of the leverage (h), a measure of distance from the calibration set in the descriptor space. Repeated simulations with experimental data show that the root mean squared error of predictions increases with h, and that large prediction errors (>3 SDtraining, the standard deviation of training data) occur more frequently when h exceeds the common threshold of 3 hmean, where hmean is the mean h of all training compounds. Nevetheless, analysis also shows that well-calibrated PP-LFERs with many (e.g., 100), diverse, and accurate training data are highly robust against extrapolation; extreme prediction errors (> 5 SDtraining) are rare. For such PP-LFERs, 3 hmean may be too strict as the cutoff for AD. Evaluation of published PP-LFERs in terms of their AD using 25 chemically diverse, environmentally relevant chemicals as AD probes indicated that many reported PP-LFERs do not cover organosiloxanes, per- and polyfluorinated alkylsubstances, highly polar chemicals, and/or highly hydrophobic chemicals in their AD. It is concluded that calculation of h is useful to identify model extrapolations as well as the strengths and weaknesses of the trained PP-LFERs.
Supplementary materials
Title
Electronic supplementary information
Description
Additional tables and figures
Actions