These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Nature_domain_knowledge_for_predictions_ (1).pdf (1.19 MB)

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

submitted on 20.02.2020, 19:32 and posted on 21.02.2020, 11:17 by Ryan Murdock, Steven Kauwe, Anthony Wang, Taylor Sparks
New methods for describing materials as vectors in order to predict their properties using machine learning are common in the field of material informatics. However, little is known about the comparative efficacy of these methods. This work sets out to make clear which featurization methods should be used across various circumstances. Our findings include, surprisingly, that simple one-hot encoding of elements can be as effective as traditional and new descriptors when using large amounts of data. However, in the absence of large datasets or data that is not fully representative we show that domain knowledge offers advantages in predictive ability.


National Science Foundation

CAREER: SusChEM: Data Mining to Reduce the Risk in Discovering New Sustainable Thermoelectric Materials

Directorate for Mathematical & Physical Sciences

Find out more...


Email Address of Submitting Author


University of Utah


United States

ORCID For Submitting Author


Declaration of Conflict of Interest



Read the published paper

in Integrating Materials and Manufacturing Innovation