These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
extrapolation_paper.pdf (1.46 MB)

Can Machine Learning Find Extraordinary Materials?

submitted on 08.08.2019 and posted on 09.08.2019 by Steven Kauwe, Jake Graser, Ryan Murdock, Taylor Sparks

One of the most common criticisms of machine learning is an assumed inability for models to extrapolate, i.e. to identify extraordinary materials with properties beyond those present in the training data set. To investigate whether this is indeed the case, this work takes advantage of density functional theory calculated properties (bulk modulus, shear modulus, thermal conductivity, thermal expansion, band gap and Debye temperature) to investigate whether machine learning is truly capable of predicting materials with properties that extend beyond previously seen values. We refer to these materials as extraordinary, meaning they represent the top 1% of values in the available data set. Interestingly, we show that even when machine learning is trained on a fraction of the bottom 99% we can consistently identify 3/4 of the highest performing compositions for all considered properties with a precision that is typically above 0.5. Moreover, we investigate a few different modeling choices and demonstrate how a classification approach can identify an equivalent amount of extraordinary compounds but with significantly fewer false positives than a regression approach. Finally, we discuss cautions and potential limitations in implementing such an approach to discover new record-breaking materials.


NSF 1651668


Email Address of Submitting Author


University of Utah


United States of America

ORCID For Submitting Author


Declaration of Conflict of Interest

No conflict of interest