Crossover Operators for Molecular Graphs with an Application to Virtual Drug Screening

17 September 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Genetic Algorithms are a powerful method to solve optimization problems with complex cost functions over vast search spaces that rely in particular on recombining parts of previous solutions. Crossover operators play a crucial role in this context. Here, we describe a large class of these operators designed for searching over spaces of graphs. These operators are based on introducing small cuts into graphs and rejoining the resulting induced subgraphs of two parents. This form of cut-and-join crossover can be restricted in a consistent way to preserve local properties such as vertex-degrees (valency), or bond-orders, as well as global properties such as graph-theoretic planarity. In contrast to crossover on strings, cut-and-join crossover on graphs is powerful enough to ergodically explore chemical space even in the absence of mutation operators. Extensive benchmarking shows that the offspring of molecular graphs are again plausible molecules with high probability, while at the same time crossover drastically increases the diversity compared to initial molecule libraries. Moreover, desirable properties such as favorable indices of synthesizability are preserved with sufficient frequency that candidate offsprings can be filtered efficiently for such properties. As an application we utilized the cut-and-join crossover in REvoLd, a GA-based system for computer-aided drug design. In optimization runs searching for ligands binding to four different target proteins we consistently found candidate molecules with binding constants exceeding the best known binders as well as candidates found in make-on-demand libraries. Taken together, cut-and-join crossover operators constitute a mathematically simple and well-characterized approach to recombination of molecules that performed very well in real-life CADD tasks.

Keywords

Crossover
Graph Theory
Genetic algorithm
Virtual Screening

Supplementary materials

Title
Description
Actions
Title
Additional Files
Description
Additional file 1 : Proofs related to the strong Connectedness of the search space G Additional file 2: Molecules used in the benchmarking section Additional file 3: Details on embedding violations observed for crossover products Additional file 4: Summary of MOSES statistics
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.