High-throughput Calculation of Atomic Planar Density for Compounds

A large collection of element-wise planar densities for compounds obtained from Materials Project is calculated using brute force computational geometry methods. We demonstrate that the element-wise max lattice plane densities can be useful as machine learning features. The methods described here are implemented in an opensource Mathematica package hosted at https://github.com/sgbaird/LatticePlane.

Traditionally, planar density calculations have been performed on an individual basis and manually (Fan, 2016). With the availability of consistent representation of arbitrary crystals in the form of crystallographic information files (CIFs) (Hall et al., 1991), extensive databases that contain collections of CIFs such as Materials Project (Jain et al., 2013) or Open Quantum Materials Database (Saal et al., 2013), and advanced 3D computational geometry libraries in software such as Mathematica (Wolfram, 2021), MATLAB (MATLAB, 2021), and Python (Vollprecht, 2021), automated computation of planar densities of arbitrary crystals for arbitrary lattice planes is possible. Indeed, in this work, we present such a workflow using Mathematica's excellent built-in computational geometry functions in conjunction with MaXrd (Ramsnes et al., 2019) which facilitates both importing CIF files and dealing with crystal symmetry domain knowledge. This package is implemented on the full collection of Materials Project compounds as obtained via the Materials API (Ong et al., 2015).
This represents the largest collection of planar densities to our knowledge with lattice planes up to a min and max Miller index of -3 and 3, respectively.
While a neat analytical formula to describe lattice plane densities in arbitrary crystal structures may exist, our method is a brute force approach which relies on numerical geometrical computations. Additionally, this provides flexibility in treatment of the lattice sites (Section 2.3) which otherwise might require extensive tailoring or entirely new analytical formulas to describe. Indeed, a relevant analytical approach has been published (Fan, 2016); however, the correctness of this method has been questioned (Liu et al., 2020) without further rebuttal as of July 2021. Additionally, our database has the benefit of containing both summed and element-wise atomic planar densities (i.e. planar densities of each of the respective periodic elements).
We describe the methods for calculating lattice plane density (Section 2) and demonstrate that this database can be used to supply features to machine learning models to predict material properties via a case study of bulk modulus (Section 3).

Planar Density Methods
We describe our approach for calculating atomic planar densities for an arbitrary Miller plane in Section 2.1. When computing planar densities for multiple Miller planes and to enhance computational efficiency, we consider only the unique Miller planes with respect to crystallographic symmetry. We then cast these into the degenerate (full) representation as a post-processing step (Section 2.2). These methods have been incorporated into a Mathematica package called LatticePlane hosted at https://github.com/sgbaird/LatticePlane. A description of various model parameters is given in Section 2.3. Bulk downloading of Materials Project CIF files and target property data is handled via the Materials API (Ong et al., 2015).

Calculation of Lattice Plane Density
Calculating the planar density for a Miller plane takes on a brute force approach using numerical computation and involves the following steps separated into Setup, Intersections, and Post-Processing categories: 1. Setup (a) Import CIF file via MaXrd  Figure 1a and b, respectively.

Mathematica Package Model Parameters
One of the benefits of using a brute force numerical approach to compute planar densities is that the model parameters can be freely tuned for a chosen application.
Relevant LatticePlane model parameters are given in Table 1.
Atoms/sites can be treated as nodes, hard spheres, or probability distributions.
While the code package implements hard sphere and probability distribution models, node density can effectively be obtained by treating the atoms as hard spheres, only considering atoms whose centers are within a tight (e.g. numerical) tolerance of the lattice plane of interest, and normalizing the intersection area by the area of the hard sphere for each atom. Node density values remain unchanged for larger supercells; however, planar densities of finite-radius hard sphere and probability distribution models when sub-hemisphere slices of atoms are considered (i.e. atoms whose centers do not lie exactly on the Miller plane) converge with increasing supercell size. Examples of intersections within a 3 × 3 × 3 supercell are shown in Figure 2.
In terms of other model parameters, atom radius factor scales the covalent radius of the atoms and defaults to a value of 1. Additionally, increasing the max Miller index increases the computational cost (Section 2.2).

Potential for Machine Learning
Addition of element-wise max lattice plane density as machine learning features improved the machine learning results relative to training only on elemental presence ( Figure 5).
The R 2 fit improved from 0.600 ± 0.031 to 0.673 ± 0.029. Mathematica's built-in Predict function was used. Predictions of bulk modulus using only elemental presence are summarized in Figure 6 to facilitate comparison with the machine learning model which incorporated element-wise max lattice plane densities.
While the machine learning was performed with respect to a bulk property, the dataset may be better suited to target properties which are planar in nature. An example of such an application is the Open Catalyst Project (Chanussot et al., 2021).

Conclusion
We created a computational workflow which calculates element-wise atomic planar densities using crystallographic information files (CIFs) as inputs and applied this to a large database of compounds. We tested the usefulness of this dataset for machine learning applications and found that in a simple test case, the addition of max elementwise atomic planar density as a feature improves the predictive accuracy relative to learning on elemental presence only using the target property of bulk modulus.

Future Work
While we chose to focus on node density, incorporation of pymatgen  charge decoration (Composition module) can help with smartly choosing radii for elements based on their covalent, ionic, and metallic radii for the hard sphere and probability density models and may offer better quality information for data science models. We hope to see this dataset used in crystallographic orientation dependent applications such as the Open Catalyst Project in pursuit of surrogate models for computationally complex physics-based simulations.       6. Machine learning predictions of bulk modulus using only elemental presence as nominal data. Compounds with less than 6 elements were zero-padded.

Synopsis
A large database of CIF files is used in conjunction with computational geometry software to calculate element-wise planar densities up to a max HKL index of 3.