Abstract
Approximately 40% of marketed drugs exhibit suboptimal pharmacokinetic profiles. Co-crystallization, where pairs of molecules form a multicomponent crystal, constitutes a promising strategy to enhance physicochemical properties without compromising the pharmacological activity. However, finding promising co-crystal pairs is resource-intensive, due to the large and diverse range of possible molecular combinations. We present DeepCocrystal, a novel deep learning approach designed to predict co-crystal formation by processing the ‘chemical language’ from a supramolecular vantage point. Rigorous validation of DeepCocrystal showed a balanced accuracy of 78% in realistic scenarios, outperforming existing models. Explainable AI approaches uncovered the decision-making process of DeepCocrystal, showing its capability to learn chemically relevant aspects of the ‘supramolecular language’ that match experimental co-crystallization patterns. By leveraging properties of molecular string representations, DeepCocrystal can also estimate the uncertainty of its predictions. We harness this capability in a challenging prospective study, and successfully discovered two novel co-crystals of diflunisal, an anti-inflammatory drug. This study underscores the potential of deep learning – and in particular of chemical language processing – to accelerate co-crystallization, and ultimately drug development, in both academic and industrial contexts. DeepCocrystal is available as an easy-to-use web application at https://deepcocrystal.streamlit.app/.