ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

08 June 2021, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Local chemical space exploration of an experimentally synthesized material can be done by making slight structural
variations of the synthesized material. This generation of many molecular structures with reasonable quality,
that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in
material design. Large databases of geometry and chemical properties of transition metal complexes are not
readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,
ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.
The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,
to place functional groups on an input structure. For example, the input structure can be a catalyst for which one
wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed
substituents are optimized using a computationally cheap force-field optimization method. After placement of
new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible
in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by
ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications
of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,
hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization
of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as
the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery
of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB
method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.
ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material
discovery.

Keywords

data-driven chemistry
open-source tool
automated geometry generation
Quantum Chemical
automated workflows
Transition Metal Complexes
Chemical space
Chemical database

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.