Abstract
Developing transferable universal machine learning models is trending in data-driven materials research. However, the effectiveness of adapting large machine learning model to specific research domain remains unclear. In this work, we choose high entropy materials as a platform and develop a specialized dataset with 145,323 DFT-relaxed high entropy materials. This dataset is used to explore the role of domain-specific knowledge in training models for broad chemical space. Our tests with three representative graph neural network architectures indicate the model complexity has much smaller influences on performance than the inclusion of critical domain knowledge. Specifically, the inclusion of low-energy atomic ordering, structures with diverse elemental coverage, and high-order interactions significantly improves the model's performance. We also find that domain knowledge-driven sampling tends to be more effective than commonly used unsupervised learning techniques. This research highlights that developing specialized datasets is more beneficial than further complicating deep learning architectures. Additionally, physics-inspired sampling algorithms are crucially needed in AI development for materials science.
Supplementary materials
Title
Supplementary Information
Description
Supplementary Information
Actions