A Deep Generative Model for the Inverse Design of Transition Metal Ligands and Complexes

27 January 2025, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Deep generative models yielding transition metal complexes (TMCs) remain scarce despite the key role of these compounds in industrial catalytic processes, anticancer therapies, and energy transformations. Compared to drug discovery within the chemical space of organic molecules, TMCs pose further challenges including the encoding of chemical bonds of higher complexity and the need to optimize multiple properties. In this work, we developed a junction tree variational autoencoder (JT-VAE) for the inverse design of transition metal ligands and complexes. After implementing a SMILES-based encoding of the metal–ligand bonds, the model was trained with the tmQMg-L ligand library, allowing for the generation of thousands of novel, highly diverse monodentate (κ1) and bidentate (κ2) ligands. The generated ligands were labeled with two target properties of the associated homoleptic square planar iridium TMCs: the HOMO-LUMO gap (ϵ) and the metal charge (qIr), both computed with a DFT method. These properties were used to implement a conditional JT-VAE model that generated ligands from a prompt, with the single- or dual-objective of optimizing either or both the ϵ and qIr properties. A similar model was implemented to condition the generation of metal ligands by their solubility and steric bulk. The JT-VAE models were able to navigate the central and extreme regions of these bidimensional property spaces, allowing for chemical interpretation after decoding their stepwise optimization. These optimizations also had an impact on other chemical properties of interest, including ligand dissociation energies and oxidative addition barriers.

Keywords

inverse design
deep learning
variational autoencoders
metal ligands
transition metal complexes
generative models

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The supporting information provides further details on RDKit versions, curation of training SMILES, metal–ligand encodings, coordination environments, synthetic accessibility, and the unconditional and conditional models, including DFT computational details, outlier and latent space analyses, evaluation metrics, and the definition of the BkM parameter.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.