Theoretical and Computational Chemistry

ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

Vivek Sinha Delft University of Technology

Abstract

Local chemical space exploration of an experimentally synthesized material can be done by making slight structural
variations of the synthesized material. This generation of many molecular structures with reasonable quality,
that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in
material design. Large databases of geometry and chemical properties of transition metal complexes are not
readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,
ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.
The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,
to place functional groups on an input structure. For example, the input structure can be a catalyst for which one
wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed
substituents are optimized using a computationally cheap force-field optimization method. After placement of
new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible
in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by
ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications
of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,
hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization
of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as
the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery
of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB
method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.
ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material
discovery.

Version notes

version 1 submitted to Chemarxiv version 2 uploaded

Content

Thumbnail image of ChemSpaX_v2.pdf
download asset ChemSpaX_v2.pdf 20 MB [opens in a new tab]