Artificial Intelligence Guided De Novo Molecular Design Targeting COVID-19

30 June 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

An extensive search for active therapeutic agents against the SARS-CoV-2 is being conducted across the globe. Computational docking simulations have traditionally been used for in silico ligand design and remain popular method of choice for high-throughput screening of therapeutic agents in the fight against COVID-19. Despite the vast chemical space (millions to billions of biomolecules) that can be potentially explored as therapeutic agents, we remain severely limited in the search of candidate compounds owing to the high computational cost of these ensemble docking simulations employed in traditional in silico ligand design. Here, we present a de novo molecular design strategy that leverages artificial intelligence to discover new therapeutic biomolecules against SARS-CoV-2. A Monte Carlo Tree Search algorithm combined with a multi-task neural network (MTNN) surrogate model for expensive docking simulations and recurrent neural networks (RNN) for rollouts, is used to sample the exhaustive SMILES space of candidate biomolecules. Using Vina scores as target objective to measure binding of therapeutic molecules to either the isolated spike protein (S-protein) of SARS-CoV-2 at its host receptor region or to the S-protein:Angiotensin converting enzyme 2 (ACE2) receptor interface, we generate several (~100's) new biomolecules that outperform FDA (~1000’s) and non-FDA biomolecules (~million) from existing databases. A transfer learning strategy is deployed to retrain the MTNN surrogate as new candidate molecules are identified - this iterative search and retrain strategy is shown to accelerate the discovery of desired candidates. We perform detailed analysis using Lipinski's rules and also analyze the structural similarities between the various top performing candidates. We spilt the molecules using a molecular fragmenting algorithm and identify the common chemical fragments and patterns – such information is important to identify moieties that are responsible for improved performance. Although we focus on therapeutic biomolecules, our AI strategy is broadly applicable for accelerated design and discovery of any chemical molecules with user-desired functionality.

Keywords

Artificial Intelligence
Machine Learning
COVID-19
de novo molecular design
ligand design

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.