Molecular Partition Coefficient from Machine Learning with Polarization and Entropy Embedded Atom- Centered Symmetry Functions

Authors

Abstract

Efficient prediction of the partition coefficient ($\log P$) between polar and non-polar phases could shorten the cycle of drug and materials design. In this work, a descriptor, named $\langle q-ACSFs \rangle_{conf}$, is proposed to take the explicit polarization effects in polar phase and conformation ensemble of energetic and entropic significance in non-polar into considerations. The polarization effects are involved by embedding the partial charge directly derived from force fields or quantum chemistry calculations into the atom-centered symmetry functions (ACSFs), together with the entropy effects which are averaged according to Boltzmann distribution of different conformations taken from similarity matrix. The model was trained with the high-dimensional neural networks (HDNNs) on a public dataset PhysProp (with $41039$ samples). Satisfactory $\log P$ prediction performance was achieved on three other datasets, namely, Martel ($707$ molecules), Star \& Non-Star ($266$) and Huuskonen ($1870$). The present $\langle q-ACSFs \rangle_{conf}$ model was also applicable to the $n$-carboxylic acid with the number of carbon ranging from $2$ to $14$ and the $54$ kinds of organic solvents. It is easy to apply the present method to arbitrary sized systems and give a transferable atom-based partition coefficient.

Content

Supplementary material

Molecular Partition Coefficient from Machine Learning with Polarization and Entropy Embedded Atom- Centered Symmetry Functions
Additional details in collected datasets, generation of descriptors, computational methods of Molecular Dynamics (MD) simulations and Quantum Mechanisms (QM), hyper-parameter optimization of high-dimensional neural networks, and contribution from distinct elements with different environments