ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1
2 files
0/0

Identifying Domains of Applicability of Machine Learning Models for Materials Science

preprint
revised on 20.12.2019 and posted on 23.12.2019 by Christopher Sutton, Mario Boley, Luca M. Ghiringhelli, Matthias Rupp, Jilles Vreeken, Matthias Scheffler
We present an extension to the usual machine learning process that allows for the identification of the domain of applicability of a fitted model, i.e., the region in its domain where it performs most accurately. This approach is applied to several vastly different but commonly used materials representations (namely the n-gram approach, SOAP, and the many body tenor representation), which are practically indistinguishable based on performance using a single error statistic. Moreover, these models appear unsatisfactory for screening applications as they fail to reliably identify the ground state polymorphs. When applying our newly developed analysis for each of the models, we can identify the domain of applicability for each model according to a simple set of interpretable conditions. We show that identification of the domain of applicability in the prediction of the formation energy enables a more accurate ground-state search - a crucial step for the discovery of novel materials.

Funding

European Union’s Horizon 2020 Research and Innovation Programe (grant agreement No. 676580)

NOMAD laboratory CoE

ERC:TEC1P (No. 740233)

History

Email Address of Submitting Author

sutton@fhi-berlin.mpg.de

Institution

Fritz Haber Institute of the Max Planck Society

Country

Germany

ORCID For Submitting Author

0000-0003-3212-1168

Declaration of Conflict of Interest

no conflict of interest

Exports