Machine Learning for Materials Scientists: An Introductory Guide Towards Best Practices

06 May 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


This Editorial is intended for materials scientists interested in performing machine learning-centered research.

We cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking datasets, model and architecture sharing, and finally publication.
In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the concepts, workflows, and best practices discussed.

Overall, the data-driven methods and machine learning workflows and considerations are presented in a simple way, allowing interested readers to more intelligently guide their machine learning research using the suggested references, best practices, and their own materials domain expertise.


machine learning
neural networks
materials science
materials informatics
data science
common pitfalls
best practices
example code
Jupyter notebooks
interactive notebooks

Supplementary materials

BestPractices paper - SI


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.