Learning chemical intuition from humans in the loop


The lead optimization process in drug discovery campaigns is an arduous endeavour where the input of many medicinal chemists is weighed in order to reach a desired molecular property profile. Building the expertise to successfully drive such projects collaboratively is a very time-consuming process that typically spans many years within a chemist's career. In this work we aim to replicate this process by applying artificial intelligence learning-to-rank techniques on feedback that was obtained from 35 chemists at Novartis over the course of several months. We exemplify the usefulness of the learned proxies in routine tasks such as compound prioritization, motif rationalization, and biased \textit{de novo} drug design. Annotated response data is provided, and developed models and code made available through a permissive open-source license.

Version notes

* New performance plots with random seeds * Microsoft AI4Science affiliation change * Updated ChEMBL version


Supplementary material

Supporting Information
Additional results

Supplementary weblinks

MolSkill GitHub repository
Link to GitHub repository with production code, models and data.