Abstract
Prediction of aqueous solubilities or hydration free energies is an extensively studied area in machine learning applications on chemistry since water is the sole solvent in the living system. However, for non-aqueous solutions, few machine learning studies have been undertaken so far despite the fact that the solvation mechanism plays an important role in various chemical reactions. Here, we introduce a novel, machine-learning based quantitative structure-property prediction method which predicts solvation free energies for various organic solute and solvent systems.
A novelty of our method involves two separate solvent and solute encoder networks that can quantify structural features of given compounds via word embedding and recurrent layers, with the attention mechanism which extracts important substructures from outputs of recurrent neural networks. As a result, the predictor network calculates solvation free energy of a given mixture using features from encoders. With results obtained from extensive calculations on 2495 solute-solvent mixtures, we demonstrate that our methodology outperforms both ab initio and MD solvation model in terms of estimation error for solvation energy.
A novelty of our method involves two separate solvent and solute encoder networks that can quantify structural features of given compounds via word embedding and recurrent layers, with the attention mechanism which extracts important substructures from outputs of recurrent neural networks. As a result, the predictor network calculates solvation free energy of a given mixture using features from encoders. With results obtained from extensive calculations on 2495 solute-solvent mixtures, we demonstrate that our methodology outperforms both ab initio and MD solvation model in terms of estimation error for solvation energy.