In the last few years, de novo molecular design using machine learning has made great technical progress but its practical deployment has not been as successful. This is mostly owing to the cost and technical difficulty of synthesizing such computationally designed molecules. To overcome such barriers, various methods for synthetic route design using deep neural networks have been studied intensively in recent years. However, little progress has been made in designing molecules and their synthetic routes simultaneously. Here, we formulate the problem of simultaneously designing molecules with the desired set of properties and their synthetic routes within the framework of Bayesian inference. The design variables consist of a set of reactants in a reaction network and its network topology. The design space is extremely large because it consists of all combinations of purchasable reactants, often in the order of millions or more. In addition, the designed reaction networks can adopt any topology beyond simple multistep linear reaction routes. To solve this hard combinatorial problem, we present a powerful sequential Monte Carlo algorithm that recursively designs a synthetic reaction network by sequentially building up single-step reactions. In a case study of designing drug-like molecules based on commercially available compounds, compared with heuristic combinatorial search methods, the proposed method shows overwhelming performance in terms of computational efficiency and coverage and novelty with respect to existing compounds.
Examples of designed products
Products and their synthetic pathway networks were designed using Seq-Stack Reaction.