The Impact of Domain Knowledge on Universal Machine Learning Models

05 August 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Developing transferable universal machine learning models is trending in data-driven materials research. However, the effectiveness of adapting large machine learning model to specific research domain remains unclear. In this work, we choose high entropy materials as a platform and develop a specialized dataset with 145,323 DFT-relaxed high entropy materials. This dataset is used to explore the role of domain-specific knowledge in training models for broad chemical space. Our tests with three representative graph neural network architectures indicate the model complexity has much smaller influences on performance than the inclusion of critical domain knowledge. Specifically, the inclusion of low-energy atomic ordering, structures with diverse elemental coverage, and high-order interactions significantly improves the model's performance. We also find that domain knowledge-driven sampling tends to be more effective than commonly used unsupervised learning techniques. This research highlights that developing specialized datasets is more beneficial than further complicating deep learning architectures. Additionally, physics-inspired sampling algorithms are crucially needed in AI development for materials science.

Keywords

High entropy
Machine learning
Universal machine learning model
Domain knowledge

Supplementary materials

Title
Description
Actions
Title
Supplementary Information
Description
Supplementary Information
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.