Critical Assessment of pH-Dependent Lipophilic Profiles of Small Molecules for Drug Design: Which One Should We Use and In Which Cases?

24 May 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Lipophilicity is a physicochemical property with wide relevance in drug design and also applied in areas such as food chemistry, environmental chemistry, and computational biology. This descriptor strongly influences the absorption, distribution, permeability, bioaccumulation, protein-binding, and biological activity of bioorganic compounds. Lipophilicity is commonly expressed as the n-octanol/water partition coefficient (PN) for neutral molecules, whereas for molecules with ionizable groups, the distribution coefficient (D) at a given pH is used. The logDpH is usually predicted using a pH correction over the logPN using the pKa of ionizable molecules, while often ignoring the ionic partition (PI) because of the challenge of predicting the partitioning of the charged species. In this work, we studied the impact of PI on the prediction of lipophilicity of small drug-like molecules by modeling 225 logDpH of a set of experimental values using both the formalism that takes into account a pH correction (see Eq. 1) and the one considering the partition of ionic species (see Eq. 2). Our findings show that a better calculation is obtained by considering the ionic partition while ignoring its contribution can lead to inadequate computational predictions. In this context, we developed machine learning algorithms to determine in which cases the PI should be considered. The results indicate that small, compact, and hydrophobic molecules with a higher likelihood of being in their ionic state at specific pH values, were better modeled using Eq. 2. Finally, we validated our findings using a test and external set where the logistic regressions, random forest classifications, and support vector machine models predicted the better formalism to determine the logDpH for each molecule with high accuracies, sensitivities, and specificities.


Partition coefficient
Lipophilic profiles
Machine learning
Drug Design


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.