Evolutionary Multiobjective Optimization of Multiligand Metal Complexes in Diverse and Vast Chemical Spaces

21 June 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Transition metal complexes (TMCs) play a key role in several areas of high interest, including medicinal chemistry, renewable energies, and nanoporous materials. The development of new TMCs enabling these technologies remains challenged by the need of optimizing multiple properties within large chemical spaces, in which the thirty transition metals of the periodic table can be combined with a virtually infinite number of ligands. In this work, we provide an open dataset, tmQMg-L, including a collection of 30K TMC ligands from the Cambridge Structural Database. tmQMg-L combines size, diversity, and synthesizability at an unprecedented scale. Each ligand is characterized by geometric information and a rich fingerprint including electronic and steric features. The ligand charge and metal-coordination mode were also assigned with a robust algorithm based on graphs and natural bond orbital theory. The tmQMg-L dataset was leveraged in the automated generation of 1.37M TMCs resulting from all possible combinations between a square planar palladium(II) scaffold and a pool of 50 different ligands. This TMC space was explored with a multiobjective genetic algorithm (MOGA) that optimized two properties over a Pareto front; namely the polarizability (alpha) and the HOMO-LUMO gap (epsilon). After exploring only 1% of the whole space (i.e. 13k TMCs), the MOGA yielded 130 diverse hits with maximal alpha and epsilon values. Despite the size of the space explored, the evolution of the hits was easily rationalized by analyzing how the MOGA picked ligands of different natures. Instead of the traditional mutation and crossover of fragments within a single ligand, the MOGA of this work implemented full-ligand genetic operations acting on all coordination sites, enforcing chemical diversity across all populations, including the last one containing the hits. We believe that the combined use of the tmQMg-L dataset with this MOGA algorithm will enable the discovery of TMCs with optimal properties within diverse and vast chemical spaces.

Keywords

evolutionary learning
multiobjective optimization
Pareto front
ligand dataset
ligand charge
genetic algorithms
mutation
crossover
diversity
chemical space
transition metal complex

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The Supporting Information provides further details about the curation of the tmQMg-L dataset, the generation of the 1.37M chemical space, the chemoinformatics descriptors, and the MOGA algorithm.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.