These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
For high-throughput screening of materials for heterogeneous catalysis, scaling relations provides an efficient scheme to estimate the chemisorption energies of hydrogenated species. However, conditioning on a single descriptor ignores the model uncertainty and leads to sub optimal prediction of the chemisorption energy. In this paper, we extend the single descriptor linear scaling relation to a multi descriptor linear regression models to leverage the correlation between adsorption energy of any two pair of adsorbates. With a large dataset, we use Bayesian Information Criteria (BIC) as the model evidence to select the best linear regression model that are derived from non-informative priors. Furthermore, Gaussian Process Regression (GPR) based on the meaningful convolution of physical properties of the metal-adsorbate complex can be used to predict the baseline residual of the selected model. This integrated Bayesian model selection and Gaussian process regression, dubbed as residual learning, can achieve performance comparable to standard DFT error (0.1 eV) for most adsorbate system. For sparse and small datasets, we propose an ad hoc Bayesian Model Averaging (BMA) approach to make a robust prediction. With this Bayesian framework, we significantly reduce the model uncertainty and improve the prediction accuracy. The possibilities of the framework for high-throughput catalytic materials exploration in a realistic setting is illustrated using large and small sets of both dense and sparse simulated dataset generated from a public database of bimetallic alloys available in Catalysis-Hub.org.