Matrix of Orthogonalized Atomic Orbital Coefficients Representation for Radicals and Ions

29 May 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


Chemical (molecular, quantum) machine learning relies on representing molecules in unique and informative ways. Here, we present matrix of orthogonalized atomic orbital coefficients (MAOC) as a quantum-inspired molecular and atomic representation containing both structural (composition and geometry) and electronic (charge and spin multiplicity) information. MAOC is based on a cost-effective localization scheme that represents localized orbitals via a predefined set of atomic orbitals. The latter can be constructed from such small atom-centered basis sets as pcseg-0 and STO-3G in conjunction with guess (non-optimized) electronic configuration of the molecule. Importantly, MAOC is suitable for representing monatomic, molecular, and periodic systems, and can distinguish compounds with identical compositions and geometries but distinct charges and spin multiplicities. Using principal component analysis, we constructed a more compact but equally powerful version of MAOC – PCX-MAOC. To test the performance of full and reduced MAOC and several other representations (CM, SOAP, SLATM, and SPAHM), we used a kernel ridge regression machine learning model to predict frontier molecular orbital energy levels and ground state single-point energies for chemically diverse neutral and charged, closed- and open-shell molecules from an extended QM7b dataset, as well as two new datasets, N-HPC-1 (N-heteropolycycles) and REDOX (nitroxyl and phenoxyl radicals, carbonyl and cyano compounds). MAOC affords accuracy that is either similar or superior to other representations for a range of chemical properties and systems.


molecular representation
quantum machine learning
redox-active molecule
density functional theory
molecular orbital

Supplementary materials

Supplementary Material
Additional computational and data analysis details, additional data, and a full set of learning curves

Supplementary weblinks


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.