Abstract
A graph-based genetic algorithm (GA) is used to identify molecules (ligands) with high absolute docking scores as estimated by the Glide software, starting from randomly chosen molecules from the ZINC database, for four different targets: Bacillus subtilis chorismate mutase (CM), human β2-adrenergic G protein-coupled receptor (β2AR), the DDR1 kinase domain (DDR1), and β-cyclodextrin (BCD). By the combined use of functional group filters and a score modifier based on a heuristic synthetic accessibility (SA) score our approach identifies between ca 500 and 6000 structurally diverse molecules with scores better than known binders by screening a total of 400,000 molecules starting from 8000 randomly selected molecules from the ZINC database. Screening 250,000 molecules from the ZINC database identifies significantly more molecules with better docking scores than known binders, with the exception of CM, where the conventional screening approach only identifies 72 compounds compared to 511 with GA+Filter+SA. In the case of β2AR and DDR1 the GA+Filter+SA approach finds significantly more molecules with docking scores lower than -9.0 and -10.0. The GA+Filters+SA docking methodology is thus effective in generating a large and diverse set of synthetically accessible molecules with very good docking scores for a particular target. An early incarnation of the GA+Filter+SA approach was used to identify potential binders to the COVID-19 main protease and submitted to the early stages of the COVID Moonshot project, a crowd-sourced initiative to accelerate the development of a COVID antiviral.