A recommendation system to predict missing adsorption properties of nanoporous materials

Nanoporous materials (NPMs) selectively adsorb and concentrate gases into their pores, and thus could be used to store, capture, and sense many diﬀerent gases. Modularly synthesized classes of NPMs, such as covalent organic frameworks (COFs), oﬀer a large number of candidate structures for each adsorption task. A complete NPM-property table, containing measurements of the relevant adsorption properties in the candidate NPMs, would enable the matching of NPMs with adsorption tasks. However, in practice the NPM-property matrix is only partially observed (incomplete); (i) many properties of any given NPM have not been measured and (ii) any given property has not been measured for all NPMs. The idea in this work is to leverage the observed (NPM, property) values to impute the missing ones. Similarly, commercial recommendation systems impute missing entries in an incomplete item-customer ratings matrix to recommend items to customers. We demonstrate a COF recommendation system to match COFs with adsorption tasks by training a low rank model of an incomplete COF–adsorption-property matrix. A low rank model, trained on the observed (COF, adsorption property) values, provides (i) predictions of the missing (COF, adsorption property) values and (ii) a “map” of COFs, wherein COFs with similar (dissimilar) adsorption properties congregate (separate). We ﬁnd the performance of the COF recommendation system varies for diﬀerent adsorption tasks and diminishes precipitously


Figure 1:
A recommendation system for nanoporous materials (NPMs). A toy NPMadsorption-property matrix is illustrated. Entry (i, j) of the matrix represents the value of adsorption property j of NPM i . Many entries are unobserved (white) because measurements are missing. The goal of our NPM recommendation system is to use the observed entries to impute the unobserved entries, allowing recommendation of NPMs for various adsorption tasks (requiring a certain adsorption property). This is analogous to commercial recommendation systems that aim to recommend customer-specific items to customers, with NPM : item :: adsorption property :: customer.
The vastness of NPM "space" [17] and the many adsorption-based applications of NPMs give rise to two important optimization problems. The first is application-led NPM search [1], where the goal is to search for an NPM structure with an optimal adsorption property for a desired application. The second is material-led application search [18], where the goal is to search for the most suitable application of a given NPM.
Given a list of candidate NPM structures and a list of adsorption properties relevant to various applications, the complete NPM-property matrix containing measurements for every (NPM, property) pair reduces both application-led NPM search and NPM-led application search to look-up problems. However, in practice, the NPM-property matrix, constructed from experimental data collected from the literature [19] and/or databases of simulated gas adsorption in libraries of NPMs [18], is likely incomplete, because many (NPM, property) values have not been observed. I.e., (i) for any given NPM, only a proportion of its adsorption properties have been measured, and (ii) for any given adsorption property, it has been measured in only a proportion of the NPMs. See Fig. 1.
The idea in this work is to leverage the observed (NPM, property) values to predict the missing ones-i.e., to impute the missing values of, or complete, the NPM-property matrix. This predic-tion task, of interest primarily to recommend NPMs for specific adsorption tasks, is analogous to a commercial recommendation system for items to customers (see Box and Fig. 1), materials : items :: properties : customers. A machine learning strategy to complete the NPM-property matrix is much less expensive and time-consuming than experimentally measuring or computationally simulating these missing properties. The machine-completed NPM-property matrix is valuable because it can be used to direct higher-fidelity but more expensive experimental measurements of properties towards the most promising materials-allowing a more efficient use of resources in both application-led material search and material-led application search.
Box: analogy with commercial recommendation systems In commercial recommendation systems, observed (item, customer) ratings are used to predict missing ratings for the recommendation of items to customers [20]. For example, movie ratings by Netflix users can be stored in a movie-user ratings matrix (rows=movies, columns=users, entries=ratings) [21]. Most entries in the matrix are missing, as (i) each user has rated only a small proportion of the movies and (ii) each movie is rated by only a small proportion of the users. A movie recommendation system imputes the missing (movie, user) ratings using the observed ones (perhaps, using features of the movies and users as well) in order to make user-specific recommendations of movies. Thus, our (material, property) values are analogous to (item, customer) ratings in commercial recommendation systems. Though, two distinctions are (i) typically, there are many more customers than items in commercial recommendation systems, compared to more materials than properties in material recommendation systems, and (ii) the entries in an item-customer ratings matrix have the same units and scale, whereas the units and scale could vary across different properties in the material-property matrix. Distinction (i) is inconsequential for low rank models because the model is transpose-equivariant.
Our hypothesis, which would permit accurate matrix completion, is that the NPM-property matrix exhibits a low rank structure [22,23], owing to underlying structural and chemical similarities among both NPMs and gas species that dictate their interactions. A low rank structure implies both NPMs and adsorption properties can be represented by low-dimensional vectors that together express the affinity between a (NPM, property) pair; these latent representations can be machine-learned, jointly, using the observed (NPM, adsorption property) values and then used to impute the missing values [21,24].
Herein, we demonstrate the imputation of missing data in an incomplete material-property matrix through learning low rank models [24]. Particularly, we train low rank models of COF-gasadsorption-property matrices, pertaining to 560 experimentally-reported COFs [25] and the simulated uptake of CH 4 , H 2 O, H 2 S, Xe, Kr, CO 2 , N 2 , O 2 and H 2 at various conditions [18] that apply to different gas storage and separation applications of COFs. Advantageously, this COF-gasadsorption-property matrix is in reality complete, allowing us to artificially introduce different fractions of missing values and investigate the effect of sparsity on the performance of the recommendation system. From the observed (COF, gas adsorption) values, the low rank model machine-learns low-dimensional latent vector representations of both the COFs and the adsorption properties, al-lowing the (i) imputation of the missing values of the adsorption properties and (ii) drawing of a COF "map" that clusters together COFs with similar adsorption properties.

Computational methods for materials discovery
Virtual screenings of NPMs for adsorption-based applications use molecular models and simulations [26,27] to [cheaply, relative to conducting an experiment in the lab] predict the adsorption property of each candidate material [16,28,29]. As opposed to an exhaustive virtual screening, genetic algorithms [30][31][32] and Monte Carlo tree search [33] have been used to more efficiently search for the NPM(s) with the optimal adsorption property. Supervised machine learning models have been widely used to predict the adsorption properties of the NPMs at a lower cost than but similar fidelity to the molecular simulations [34][35][36][37][38][39][40][41]. In this approach, (i) molecular simulations are used to label a small subset of the candidate NPMs with the gas adsorption property, (ii) these examples are used to train a machine learning model to predict the gas adsorption of an NPM from cheaply computed structural and chemical features, and then (iii) the supervised machine learning model is used, as a surrogate model for the molecular simulations, to (cheaply) predict the properties of the remaining materials. See reviews in Refs. [42][43][44][45][46][47][48]. Limitations of such a task-specific supervised learning approach are (a) knowledge is not transferred between tasks, (b) the correlation between different targets is not leveraged, and (c) a task-specific, information-rich feature vector representation of the NPM is required. Though, transfer learning [49] and graph neural networks [50][51][52] alleviate limitations (a) and (c), respectively.
Unsupervised machine learning holds promise for materials discovery by clustering together materials with similar structures and thus properties and learning a low-dimensional (e.g., 2D) embedding of their structures into a "map" of materials [53][54][55]. Such a map facilitates lead-optimization approaches to materials discovery and the selection of structurally diverse sets of materials to adequately explore materials space.
Finally, autoencoders enable inverse design [56][57][58], where one specifies a desired adsorption property, and the machine learning model generates a NPM structure with that property.

Placing our work in context
Our material recommendation system deviates from previous data-driven approaches to predict adsorption properties of NPMs by combining observations of many different properties to impute missing ones without using explicitly hand-crafted [71][72][73] features of the NPMs. Instead, from the observed (NPM, adsorption property) values, our recommendation system jointly machine-learns latent representations of both the NPMs and adsorption properties that express (NPM, adsorption property) affinities, taking advantage of low rank structure in the material-property matrix. In some sense, our recommendation system is related to multi-task learning, but it (i) does not require a hand-crafted or machine-learned vector representation of the NPM and (ii) handles missing values in the target vectors associated with the NPMs.
Our work is not the first machine-learned recommendation system for use in the chemical sciences. Yuana et al. [74] imputed missing gas permeability in polymers. Sosnia et al. [75] developed a recommendation system for antiviral drugs by learning a low rank model of a compound-virus activity matrix. Seko et al. [76] and Hayashi et al. [77] used matrix/tensor factorization to predict stability based on composition and optimal processing conditions, respectively, of inorganic materials.

The material recommendation system
Here, we formulate the general problem of material-property matrix completion. A material recommendation system jointly machine-learns, from observed (material, property) values, low-dimensional latent vector representations of the materials and properties that express (material, property) affinities. These learned representations allow us to (i) impute the missing (material, property) values and (ii) draw a map of the materials, wherein materials with similar properties congregate.
The data. We have observations of A mp ∈ R, the value of property p in material m, for (m, p) ∈ Ω ⊂ {1, 2, ..., M}×{1, 2, ..., P }, which defines Ω as the set of ordered pairs describing the entries in A that are observed. That is, the material-property matrix A ∈ R M×P , whose entry (m, p) is A mp , is not complete; some entries are missing (|Ω| < MP ).
The objective. The objective is to complete the material-property matrix by predicting the missing The low-rank model. From an element perspective, the low-rank model assumes that each element of the matrix, A mp , decomposes into where m m ∈ R k and p p ∈ R k are low-dimensional (k < M, P ), latent vector representations of material m and property p, respectively, and µ m ∈ R is a bias for material m. The materialproperty interaction term, the dot product m m p p , represents the "affinity" (if positive) or "aversion" (if negative) of material m for property p. Geometrically, the interaction term is positive (negative) if m m and p p point in roughly the same (opposite) direction. The magnitude of the interaction term depends on both the angle between m m and p p and their norms. The material bias µ m reflects variation of the values of the properties of material m independent of interactions; some materials may simply tend to have higher or lower values of the properties. See Koren et al. [21]. From a matrix perspective, the low rank model factorizes the material-property matrix A as: with the columns of matrices M ∈ R k×M and P ∈ R k×P containing the latent representations of materials and properties, respectively; the entries of the column vector µ ∈ R M containing the material biases; and 1 ∈ R P a column vector of ones.
See Fig. 2. The dimensionality of the latent space, k < M, P , imposes the constraint rank(M P) ≤ k, hence eqn. 2 is a low rank model / approximation of the matrix

A.
The utility of the low rank model. The low rank model of the materials-property matrix is useful for two purposes [24].
(1) Imputation of missing entries. The decomposition in eqn. 1 holds for both observed and unobserved (material, property) values. Thus, once we learn M, P, and µ from the observed entries, we can predict the unobserved entries, as is clear from eqn. 2.
(2) Construction of a low-dimensional map of the materials and properties. The rows of a fully observed version of A, which lie in a P -dimensional vector space, can be viewed as feature vectors of the materials. In this view, each material is represented by a list of its properties. The set of latent vector representations of the materials, in the rows of M , are embeddings/ compressions of the rows of A into a lower (k < P ) dimensional vector space. [24] Within this latent space, materials, represented by m m 's, that tend to have similar (dissimilar) properties congregate (separate). Thus, with the latent representations of the materials, the m m 's, we can (i) use clustering algorithms to group together materials with similar properties and (ii) visualize the scatter of the materials in the low-dimensional space to make a "map" of materials.
Similarly, the columns of a complete version of A can be viewed as vector representations of the properties, and the columns of P, the latent vector representations of the properties, are embeddings/compressions of them. Within this latent space, properties, represented by p p 's, that tend to take on similar (dissimilar) values in NPMs congregate (separate).
As a consequence of the dot product m m p p in eqn. 1, the magnitude and directions of a pair of latent material and property vectors (m m , p p ), taken together, indicate the affinity/aversion for each other, since m m p p = ||m m || 2 ||p p || 2 cos φ, with φ the angle between m m and p p .
Machine-learning the low rank model. We learn the latent representations of the materials and properties and the material biases by balancing (i) the matching of the observed values of the matrix by the model given in eqn. 1 and (ii) the complexity of the latent vector representations, to avoid overfitting. Specifically, we aim to choose the M, P, and µ that minimize the loss = (M, P, µ): The first term is the approximation error, measured over all observed (m, p) pairs. The second term provides L2 regularization of the latent vector representations of the materials and properties to prevent overfitting and improve generalization, where λ > 0 is the regularization parameter. The sums are normalized by the number of elements in the sum to properly weigh regularization of the latent material and property vectors.
Either stochastic gradient descent or alternating minimization can be used to find the (M, P, µ) that minimize . The latter alternates between fixing M and optimizing P and fixing P and optimizing M. See Refs. [21,24].
There are two hyperparameters in the low rank model: (1) k ∈ {0, 1, ..., min(M, P )}, the dimensionality of the vector space containing the latent representations of the materials and properties and (2) λ ∈ [0, ∞), the regularization parameter that trades off prediction accuracy on the training data and the complexity of the latent vector representations.

Case study: a COF recommendation system
We now demonstrate a material recommendation system based on a low rank matrix model. Here, the materials are COFs, and the properties are the equilibrium uptakes of a variety of gases at different conditions, obtained from molecular simulations. The Julia code to reproduce all of our work is available at github.com/SimonEnsemble/material_recommendation_system.

The dataset
We leverage an open data set of simulated gas adsorption properties in M = 560 experimentally reported, structurally optimized, and porous COF materials [18,25]. We selected P = 16 conditions that apply to different gas storage and separation applications. We log 10 -transformed the Henry coefficients because of the relatively long tail of their distributions. The resulting COFadsorption-property matrix, A complete ∈ R 560×16 is fully observed, allowing us to study the effect of the fraction of missing entries on the performance of the low rank model.

Simulating the process of data collection
We simulate the stochastic process of incomplete data collection to construct an incomplete COFadsorption-property matrix A (θ) (still M = 560 × P = 16) with a fraction θ of missing entries. We construct A (θ) by (uniform) randomly sampling, without replacement, (1 − θ)MP entries to ablate (change to missing) from the MP entries of A complete . Fig. 4 visualizes a resulting incomplete COF-adsorption-property matrix A (0.4) with a fraction θ = 0.4 missing entries.

Standardization of adsorption properties
We standardize the adsorption properties (the columns of A (θ) ) to have mean zero and unit variance using only the observed training examples. Standardization accounts for the different scales of the different properties and prevents properties with a larger variance from dominating the loss function in eqn. 3. See Ref. [24] for theoretical arguments for standardization. The entries in Fig. 4 are standardized, hence the diverging colormap.

Training, hyperparameter tuning, and testing
We use LowRankModels.jl [24] in the Julia programming language [78] to train our low rank models of the form in eqn. 2. LowRankModels.jl implements an alternating proximal gradient descent [24] to minimize the loss in eqn. 3.
For training and hyperparameter (k , λ) tuning, we randomly partitioned the [simulated] observed entries of A (θ) into an 80/20 % training/validation set. The loss in eqn. 3 is minimized over the Figure 3: The distribution of (diagonal) and pairwise relationships between (off-diagonal) the simulated gas adsorption properties of the COFs (data from Ref. [18]). Each point represents a COF. Each property was standardized to have zero mean and unit variance.
training set, while the validation set is used to select optimal hyperparameters. The remaining, [simulated] unobserved entries serve as test data to estimate the generalization error of the low rank model for matrix completion.
To determine the optimal hyperparameter tuple (k (θ) opt , λ (θ) opt ) for a given A (θ) , we perform a hyperparameter sweep over a (k, λ) grid, training one low rank model for each (k, λ). We select (k (θ) opt , λ (θ) opt ) as the hyperparameter tuple whose low rank model produces the lowest approximation error over

Imputing missing entries
We judge the performance of the low rank model for imputing the missing entries of the COFadsorption-property matrix A (0.4) by comparing the predictions of the missing entries to the actual values in the test data set, composed of the simulated unobserved entries.
The parity plot in Fig. 5 shows the joint distribution of predicted and actual values of the (standardized and, in the case of Henry coefficients, log 10 -transformed) adsorption properties in the test data set-the simulated unobserved entries of A (0.4) . The density is greatest along the diagonal line of equality, indicating that the recommendation system is providing predictive value. The RMSE and Spearman's rank correlation coefficient on the test data is 0.6 and 0.77, respectively.
The ultimate utility of the recommendation system is to rank COFs according to specific properties (for specific applications). Spearman's rank (here, a ranking of COFs) correlation coefficient, ρ, between the prediction of a missing adsorption property by the deployment low rank model and its actual value (from the test set) is shown for each adsorption property in Fig. 6. With the exception of H 2 O Henry coefficients, the recommendation system ranks the COFs according to their properties reasonably well, with ρ > 0.6. The relatively poor ranking of COFs by H 2 O Henry coefficient is explained by its very weak correlation with the other properties (see Fig. S1).
As a baseline to judge the performance of our recommendation system, we also train and test (on with the k > 0 low rank model, we quantify the extent to which the interactions between the COFs and the gas adsorption properties-encoded in m m p p terms for k > 0-are useful in the recommendation system for imputing the missing values. For each adsorption property, the stars in Fig. 6 show Spearman's rank correlation coefficients between the value of the missing property (from the test set) and the prediction of the missing property by the benchmark material bias model. Indeed, the interaction term enhances the ability of the recommendation system to rank COFs according to their adsorption properties, though by different margins depending on the property. O 2 adsorption at (298 K, 5 bar) and N 2 adsorption at (300 K, 0.001 bar) are the two properties where the interaction term is playing only a marginal role. Overall, this indicates that our recommendation system is (i) learning interactions between COFs and the adsorption properties and (ii) more likely to suggest high-performing COFs for an application than a simpler strategy that selects COFs purely based on how they perform on average (as in the material bias model).

The COF biases
The learned material bias of COF m, µ m ∈ R, in eqn. 1 roughly describes the typical value of the (standardized) gas adsorption properties of COF m. Visualization of µ can give us an idea of which COFs tend to exhibit the largest and smallest values of the gas adsorption properties. Fig. 7

The learned map of COFs and gas adsorption properties
The learned latent representation of COF m, m m ∈ R k , encodes its adsorption properties into a To visualize the map of COFs and adsorption properties, we resort to a dimension reduction method, Uniform Manifold Approximation and Projection (UMAP) [79], which embeds the latent representations of the COFs and properties, contained in the columns of M and columns of P respectively, into a 2D space. N.b. we apply UMAP on the horizontally concatenated matrix M P, as opposed to M and P separately, so that the latent representations of the material and property vectors are comparable. Fig. 8 shows In summary, Fig. 8 illustrates that the recommendation system machine-learns, from incomplete data, a meaningful map of COFs, wherein COFs with similar adsorption properties congregate.

The effect of observed fraction θ on performance
Because the COF-adsorption-property A complete from Ref. [18] is in reality complete, we have the luxury of studying the impact of the fraction of observed entries, θ, on the performance of the recommendation system. This investigation is important to address the practical question: how complete must the COF-adsorption-property matrix be for the recommendation system to reliably rank COFs according to their adsorption properties? conducted a hyperparameter sweep using a training/validation split of the observed entries, retrained a deployment model on all observed entries, then tested the deployment model on the unobserved (missing) entries serving as test data. Fig. S2 shows the distribution (among the simulations of data collection) of optimal hyperparameters (k (θ) opt , λ (θ) opt ) for each θ. Fig. 9 shows Spear-

Conclusion and Discussion
In materials science, we are often interested in many different properties of many different materials. The corresponding material-property matrix often, in practice, has many missing values, since every property of every material has not been measured. The idea of a material recommendation system is to leverage the observed (material, property) values to impute the missing ones. The (material, property) values are mathematically analogous to (item, customer) ratings in commercial recommendation systems.
We demonstrated a COF recommendation system for different gas adsorption applications. Our COF-adsorption-property matrix was composed of the simulated uptake of several gases at different conditions in 560 COF structures by Ongari et al. [18]. We simulated the process of data observation by artificially introducing missing values into the matrix. The (simulated) unobserved entries served as test data to assess the performance of the data imputation by the recommendation system. To both (i) impute the missing adsorption properties and (ii) machine-learn a "map" of COFs, wherein COFs with similar adsorption properties congregate, we trained a low rank matrix model [24] of the COF-adsorption-property matrix that had missing entries. The recommendation system was able to rank COFs according to their adsorption properties reasonably well (Spearman's rank correlation coefficient > 0.6), with the exception of water Henry coefficients. Moreover, coloring of the learned map of COFs by the adsorption properties indicated that, indeed, COFs with similar (dissimilar) adsorption properties clustered together (separated) in the map. The imputation performance of the recommendation system precipitously drops once the fraction of missing entries exceeds 60 %, though this figure does not necessarily generalize to other data sets.
We conclude that material recommendation systems, if sufficient training data is available, could be widely useful for leveraging measured properties of materials to fill in missing measurements. In turn, this could accelerate the matching of materials for specific applications.
The success of a recommendation system for NPMs is, however, predicated on structured, open databases of NPMs and their adsorption properties. One such database is the NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials [68] (NIST-ISODB) that has collected and compiled gas adsorption measurements in NPMs from the literature, for both experimental and simulation sources, for a variety of gases at a wide range of conditions. We originally set out to develop a recommendation system using Henry coefficients extracted from this experimental data, but we found the resultant recommendation system was unable to reliably rank NPMs according to their adsorption properties. Particularly, we found the recommendation system with an interaction term included could not outperform the baseline material offset model. We propose four explanations for the poor performance of our recommendation system based on an NPM-adsorption-property matrix from NIST-ISODB; the explanations include both data-centric and model-centric concerns. First, the Henry coefficient matrix we constructed based on NIST-ISODB was only ∼20 % complete (i.e., too many missing values), which may have limited the success of recommendations for the remainder of the matrix. (Recall that the COF-adsorption-property matrix needed to be at least 40 % complete to satisfactorily make recommendations for the remainder of the matrix. This benchmark for the COF recommender system may not generalize to a broad set of NPMs and properties, but is nonetheless informative.) Second, the NPM-adsorption-property matrix may have included too much noise for a successful recommendation. It is known from meta-analyses of isotherms catalogued in NIST-ISODB [80,81] that experimentally measured gas adsorption isotherms in NPMs exhibit high variance; this variance is ultimately manifested as noise that limits the success of the recommendation system. Third, the accuracy and reliability of Henry coefficients obtained from isotherms in NIST-ISODB are naturally limited by the source of data in the database itself. NIST-ISODB is primarily constructed from manual extraction of isotherm data from graphical figures in literature articles, since it is not common practice in the adsorption community to provide gas ad-sorption measurements as raw tabular data in publications. (As of the time of writing this work, only 1.3 % of isotherms in NIST-ISODB were from tabular data sources.) Consequently, the adsorption isotherm data loses precision first when the data is plotted graphically and second by human error when the graphical figure is digitized back to numerical data. This loss of precision particularly affects the generation of Henry coefficients from figures, as those coefficients are especially dependent on low-pressure data, which is often difficult to extract from isotherms plotted on a linear pressure scale. Fourth, our low rank model in eqn. 1 is linear; a non-linear model may be able to capture relationships between the adsorption properties and achieve better performance. The third issue can be addressed by community adoption of the practice of releasing raw adsorption isotherm data in standardized, structured formats (cf. the CIF standard for crystallography [82]), which has been discussed previously [28,83,84].
We introduced missing entries in the (in reality, fully observed) COF-adsorption-property matrix by (uniform) randomly selecting entries to ablate. In practice, however, (i) some properties are more commonly measured than others, (ii) some materials are more commonly studied than others owing to e.g. ease of synthesis, and (iii) there are likely correlations between and temporal trends with the binary random variables that represent whether the (material, property) values are observed.
To expand on (iii), for example, a material with a superior (inferior) value of a desired property may become popular (unpopular) for measurements of other properties. Future work entails (a) creating a model for the selection bias in selecting materials for measurements of properties and (b) accounting for the selection bias in the recommendation system [85].
Another interesting direction for future work is to determine what (material, property) measurements should be made next to most improve the recommendation system, in an active learning strategy [86].
As a remark, recommendation systems suffer from the cold start [20] problem: if a new material is reported, but none of its properties have been observed, the recommendation system is unable to make a prediction about any of its properties. To learn the latent material vector of this material, m M+1 , using the loss in eqn. 3, we must have an observation of at least one property of the new material.
To (i) improve the performance of the recommendation system and (ii) alleviate the cold start problem, we propose to include structural and chemical properties of the materials that contribute to the prediction, in addition to the observed adsorption properties. For example, we could include in the model other information about the NPM structures, such as the void fraction, surface area, percent carbon atoms, etc. In the analogy with movie recommendation systems, this is analogous to including features about the movies, such as the genre, directors, year of production, and actors. These features could be added as additional (fully observed) columns in the material-property matrix, A.
The material recommendation system is practically useful for recommending (i, application-led material search [1]) a material that optimizes a specific property or (ii, material-led application search [18]) an application for a given material. To motivate an experimental measurement in the lab, it may be necessary to quantify the uncertainty associated with a property imputed by the recommendation system. We remark that one could achieve this through bootstrapping and training an ensemble of recommendation systems on the bootstrap samples of observed (material, property) values.