Artificial Intelligence Guided De Novo Molecular Design Targeting COVID-19

An extensive search for active therapeutic agents against the SARS-CoV-2 is being conducted across the globe. Computational docking simulations have traditionally been used for in silico ligand design and remain popular method of choice for high-throughput screening of therapeutic agents in the fight against COVID-19. Despite the vast chemical space (millions to billions of biomolecules) that can be potentially explored as therapeutic agents, we remain severely limited in the search of candidate compounds owing to the high computational cost of these ensemble docking simulations employed in traditional in silico ligand design. Here, we present a de novo molecular design strategy that leverages artificial intelligence to discover new therapeutic biomolecules against SARS-CoV-2. A Monte Carlo Tree Search algorithm combined with a multi-task neural network (MTNN) surrogate model for expensive docking simulations and recurrent neural networks (RNN) for rollouts, is used to sample the exhaustive SMILES space of candidate biomolecules. Using Vina scores as target objective to measure binding of therapeutic molecules to either the isolated spike protein (S-protein) of SARS-CoV-2 at its host receptor region or to the S-protein:Angiotensin converting enzyme 2 (ACE2) receptor interface, we generate several (~100's) new biomolecules that outperform FDA (~1000’s) and non-FDA biomolecules (~million) from existing databases. A transfer learning strategy is deployed to retrain the MTNN surrogate as new candidate molecules are identified - this iterative search and retrain strategy is shown to accelerate the discovery of desired candidates. We perform detailed analysis using Lipinski's rules and also analyze the structural similarities between the various top performing candidates. We spilt the molecules using a molecular fragmenting algorithm and identify the common chemical fragments and patterns – such information is important to identify moieties that are responsible for improved performance. Although we focus on therapeutic biomolecules, our AI strategy is broadly applicable for accelerated design and discovery of any chemical molecules with user-desired functionality.