Virtual Variables-Enabled Generation of Datasets for Prediction in Organic Synthesis: Digitalization of Small Molecules and Application to Functional Molecule Syntheses

21 September 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The generation of large datasets in traditional organic synthesis experiments is extremely challenging. Hence, methods that predict outcomes accurately using a limited amount of data are in demand. We propose a molecular technology based on generative artificial intelligence that generates data from unexplored conditions and establishes the most suitable relationships between different small molecules using virtual variables. Our approach reveals relationships among three structurally different small molecules and represents them as virtual variables, which are then utilized to propose the reaction conditions for synthesizing target molecules in high yields. We demonstrate its utility in small molecule syntheses through its application to the iodination reaction of polyfluoronaphthalenes. By computationally generating inaccessible data through reasonable reaction experiments, we successfully optimized reaction conditions. We introduce a novel application of machine learning as a molecular technology for predicting reaction outcomes based on a small dataset containing less than 100 data points.

Keywords

Machine-learning
Small molecule synthesis
Small data
Prediction of reaction conditions
in-silico data generation

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
1. General information 2. Synthesis and characterization of substrate 3. Preparation of magnesium amide bases 4. Iodination reaction of polyfluoronaphthalenes 5. Initial study of model selection 6. Inverse exploring descriptors from virtual variables 7. DFT calculation 8. Experimental data of iodination reaction 9. Reference 10. NMR spectra 11. Cartesian coordinates
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.