ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Nature_domain_knowledge_for_predictions_ (1).pdf (1.19 MB)

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

preprint
submitted on 20.02.2020, 19:32 and posted on 21.02.2020, 11:17 by Ryan Murdock, Steven Kauwe, Anthony Wang, Taylor Sparks
New methods for describing materials as vectors in order to predict their properties using machine learning are common in the field of material informatics. However, little is known about the comparative efficacy of these methods. This work sets out to make clear which featurization methods should be used across various circumstances. Our findings include, surprisingly, that simple one-hot encoding of elements can be as effective as traditional and new descriptors when using large amounts of data. However, in the absence of large datasets or data that is not fully representative we show that domain knowledge offers advantages in predictive ability.

Funding

National Science Foundation

CAREER: SusChEM: Data Mining to Reduce the Risk in Discovering New Sustainable Thermoelectric Materials

Directorate for Mathematical & Physical Sciences

Find out more...

History

Email Address of Submitting Author

sparks@eng.utah.edu

Institution

University of Utah

Country

United States

ORCID For Submitting Author

0000-0001-8020-7711

Declaration of Conflict of Interest

none

Exports

Read the published paper

in Integrating Materials and Manufacturing Innovation

Logo branding

Exports