CoPriNet: Graph Neural Networks provide accurate and rapid compound price prediction for molecule prioritisation.

Ruben Sanchez-Garcia; Dávid  Havasi; Gergely  Takács; Matthew  C. Robinson; Alpha Lee; Frank von Delft; Charlotte M. Deane

doi:10.26434/chemrxiv-2022-gvk2k-v4

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

CoPriNet: Graph Neural Networks provide accurate and rapid compound price prediction for molecule prioritisation.

30 June 2022, Version 4

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Compound availability is a critical property for design prioritization across the drug discovery pipeline. Historically, and despite their multiple limitations, compound-oriented synthetic accessibility scores have been used as proxies for this problem. However, the size of the catalogues of commercially available molecules has dramatically increased over the last decade, redefining the problem of compound accessibility as a matter of budget. In this paper we show that if compound prices are the desired proxy for compound availability, then synthetic accessibility scores are not effective strategies for us in selection. Our approach, CopriNet, is a retrosynthesis-free deep learning model trained on 2D graph representations of compounds alongside their prices extracted from the Mcule catalogue. We show that CoPriNet provides price predictions that correlate far better with actual compound prices than any synthetic accessibility score. Moreover, unlike standard retrosynthesis methods, CoPriNet is rapid, with execution times comparable to popular synthetic accessibility metrics, and thus is suitable for high-throughput experiments including virtual screening and de novo compound generation. While the Mcule catalogue is a proprietary dataset, the CoPriNet source code and the model trained on the proprietary data as well as the fraction of the catalogue (100K compound/prices) used as test dataset have been made publicly available at https://github.com/oxpig/CoPriNet.

Keywords

Deep learning

synthetic accesibility

price prediction

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Supplementary Information Sections

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.