Is Domain Knowledge Necessary for Machine Learning Materials Properties?

New methods for describing materials as vectors in order to predict their properties using machine learning are common in the field of material informatics. However, little is known about the comparative efficacy of these methods. This work sets out to make clear which featurization methods should be used across various circumstances. Our findings include, surprisingly, that simple one-hot encoding of elements can be as effective as traditional and new descriptors when using large amounts of data. However, in the absence of large datasets or data that is not fully representative we show that domain knowledge offers advantages in predictive ability.