Reorganization energies of flexible organic molecules as a challenging target for machine learning enhanced virtual screening

The molecular reorganization energy $\lambda$ strongly influences the charge carrier mobility of organic semiconductors and is therefore an important target for molecular design. Machine learning (ML) models generally have the potential to strongly accelerate this design process (e.g. in virtual screening studies) by providing fast and accurate estimates of molecular properties. While such models are well established for simple properties (e.g. the atomization energy), $\lambda$ poses a significant challenge in this context. In this paper, we address the questions of how ML models for $\lambda$ can be improved and what their benefit is in high-throughput virtual screening (HTVS) studies. We find that, while improved predictive accuracy can be obtained relative to a semiempirical baseline model, the improvement in molecular discovery is somewhat marginal. In particular, the ML enhanced screenings are more effective in identifying promising candidates but lead to a less diverse sample. We further use substructure analysis to derive a general design rule for organic molecules with low $\lambda$ from the HTVS results.


I. INTRODUCTION
By providing fast and accurate predictions of molecular properties, chemical machine learning (ML) has the potential to significantly increase the speed and scope of molecular discovery. [1][2][3] In this context, much attention has been paid on properties that are directly available from single-point electronic structure (e.g. density functional theory, DFT) calculations, such as atomization energies [4][5][6] or molecular orbital energies. 7,8 For established benchmark sets of small molecules like QM9, 9 state-of-the-art ML models now reach extremely high accuracies for such properties, often surpassing the intrinsic error of the reference electronic structure methods.
Despite this success, there remains a gap between the small, rigid molecules in QM9 and technologically or pharmaceutically relevant compounds, which are often larger and much more flexible. Furthermore, the target properties of molecular discovery are in practice seldom simple electronic properties that are directly accessible through single-point DFT calculations. Instead, complex properties like the bulk electronic conductivity, pharmacological or catalytic activity of a molecule are ultimately of interest. 10 Unfortunately, these are extremely complicated to rigorously simulate even for a single molecule. In high-throughput virtual screening (HTVS) studies, it has therefore become common to focus on simplified descriptors that are known to correlate with the property of interest. [11][12][13] Such descriptors include, e.g., the binding energy of a key intermediate in catalysis or the internal reorganization energy (λ) in molecular electronics.
Measuring the energetic cost for charge-carriers to move between molecular sites, 14,15 λ provides an important contribution to the charge-carrier mobility in a) Electronic mail: margraf@fhi-berlin.mpg.de crystalline and amorphous organic semiconductors. 16,17 While computational screening for low-λ molecular structures has successfully guided discovery, 18 its sensitivity to small variations in molecular structure 19 renders a targeted molecular design challenging. Fragment [19][20][21] or rule-based 22,23 design strategies have been proposed to tackle this problem, while virtual screening [24][25][26][27][28][29] or dataefficient 30,31 discovery were used to assess large molecular candidate spaces, albeit without fully capturing the underlying structure-property relationships.
A reliable ML-based prediction of λ could fill exactly this gap -providing significant speed-ups for the assessment of thousands of molecules while potentially allowing for the extraction of robust chemical rules by explainable AI. 32 ML-based approaches were indeed recently successful for the prediction of λ for rigid molecules, 33 while flexible molecules still pose a significant challenge, 34 likely because λ simultaneously depends on two potential energy surfaces (see Fig. 1).
In this contribution we therefore critically study the ML prediction of λ (specifically for hole conduction) as a challenging problem for chemical machine learning. To this end, we present a new dataset of hybrid DFTlevel reorganization energies for 10,900 carbon and hydrogen containing molecules consisting of up to sixty atoms and five rotatable bonds. A series of Gaussian Process Regression (GPR) 35,36 models are developed for this dataset, both for straightforward structure/property mapping and Δ-ML 37 using a semiempirical baseline. We find that the conformational freedom of these molecules can introduce significant noise to this inference task, so that the performance of the models is strongly influenced by the conformer sampling method. We further show that significant improvements in the predictive performance are achieved by adopting the Δ-learning strategy. Finally, we critically evaluate the usefulness of the obtained ML methods for the discovery of low-λ structures in a diverse chemical space and for deducing molecular Focusing on holes as charge carriers, E0 and E+ are the total energies of the neutral and cationic molecular states, evaluated at the equilibrium geometries R0 and R+ of the respective states. In practice, two equilibrium geometries thus need to be obtained. design rules.

Dataset.
A set of flexible π-conjugated hydrocarbon molecules was generated by successively applying a series of molecular transformation operations to benzene (see Fig. S1), similar to the procedure used in ref. [30]. At each step, these operations modify structural elements in the parent molecule or add additional ones. The set of operations used herein includes biphenyl-conjugation, annelation (5/6-ring) and ring-contraction, among others (see SI for details). Based on these transformations, molecular structures with up to four rings and two linker atoms were randomly generated, leading to 131,810 unique structures. This set forms the virtual screening space for this study. DFT calculations were performed for a subset of 10.900 structures as detailed in the section on structure-based ML models.
While these molecules thus purposely cover a diverse molecular and conformational space, we note that -as with any enumerated chemical dataset-unstable and reactive systems could be contained and synthesizability should be assessed separately. All chemoinformaticsrelated tasks were carried out using RDKit 2019.09.03. 39 Reorganization energies. Reorganization energies were calculated for the lowest-energy conformer of each molecule. To determine this conformer, RDKit is first used to compute 2D coordinates for the molecular graph, while an initial 3D structural guess is obtained and relaxed at the GFN2-xTB level using the xTB program (v6.3.0). 40 Conformational search is then carried out using the iterative meta-dynamics sampling and genetic crossover (iMTD-GC) approach, as implemented in the "Conformer-Rotamer Ensemble Sampling Tool" (CREST). 41 Here, three different settings were compared as fully detailed in the results section.
For the lowest-energy conformers, reorganization energies were computed at the GFN1-xTB level (λ GFN1 ). Note that GFN1-xTB was chosen instead of its successor (GFN2-xTB) because we found the former to be slightly more reliable in terms of predicting λ and molecular geometries for the systems considered herein (see Figs. S2-3). Electronic descriptor values entering propertybased ML models (as detailed in the results section) were also extracted from results of these calculations. These include frontier orbital energies and their gaps, Fermi levels, total energies and vertical energy differences. Final target λ DFT values were calculated at the B3LYP 42-44 level of theory using the FHI-AIMS 45 code, including theTSdispersion correction 46 . Electronic wave functions were expanded in an extended "tier 1" basis set using "light" integration settings. Note that this level of theory is commonly employed for characterizing organic semiconductors, thus forming a good reference method for this study. 19,25,28,47 ML models. All models presented herein use GPR, a probabilistic machine learning method that allows for the smooth interpolation of property values from data. Specifically, these models infer the underlying relationship between different molecular representations and λ, based on a training set D = {X, y}. Here, X is a matrix consisting of molecular representation vectors x (i) and y is a vector of target properties for the training molecules, with elements y (i) . Predictions for a set of unseen molecular representations X * can then be obtained as the predictive mean where the covariance (or kernel) matrix K with elements K ij = K(x (i) , x (j) ) quantifies the similarity between molecular representations. The coefficients α minimize a regularized least-squares error between property predictions and reference values and can be calculated as where K(X, X) is again a covariance matrix. The hyperparameter σ n incorporates observation noise, in this case, e.g. related to uncertainty due to conformational sampling (as detailed in the section on conformer sampling). In all models reported herein, the commonly used ra- dial basis function (RBF) kernel is employed: where the l is the kernel length-scale, σ 2 f is the signal variance and d(., .) is the Euclidean distance.
A series of GPR models are presented herein, which differ in the type of representation and in how the covariance matrix is constructed. The most straightforward of these uses a representation of the molecular geometry of the lowest-energy conformer in the neutral charge state. This representation x (i) s is constructed in two steps. First, each atomic environment is encoded into a rotationally invariant local representation using the smooth overlap of atomic positions (SOAP) 48 as implemented in Dscribe 49 (see Fig. S4 for details). These atomic representations are then combined into molecular representations using the auto-bag method 50 , which partitions the local feature vectors into k max clusters using the k-means algorithm. 51 Each molecular structure can then be encoded by a k maxdimensional global feature vector that counts the occurrence of local environments that are assigned to each cluster. The effect of the hyperparameter k max on the predictive performance is shown in Fig. S5, arriving at a converged value of 500. Here, SOAP is only one of the possible choices for representing atomic environments. In fact, there is a range of modern many-body representations, which are closely related to each other and typically display comparable accuracy. 52 To illustrate this we also considered the Many-Body Tensor Representation of Huo and Rupp. 53 This indeed yields very similar predictive performance for structure based models (see Fig. S6).
Note that above we introduced the subscript s to refer to the use of structure-based molecular representations and the corresponding baseline ML model is denoted with K s . Furthermore, a model termed K p based on electronic properties computed at the semiempirical GFN1-xTB level was developed, with the corresponding representation x p (i) (see below for details). Finally, amodel K sp is explored, that combines the two kernel It should be noted that the choice of the ML method can in principle have a strong influence on the predictive accuracy. For the case of molecular reorganization energies, Abarbanel and Hutchinson therefore performed an extensive comparison of different regression approaches (e.g. using kernel, decision tree and neural network based methods), finding little difference between differ-ent ML approaches. 34 To confirm this insensitivity, we also trained a decision tree based AdaBoost 55 model on the current data set and indeed found little difference to the GPR approach used herein (see Fig. S7).

III. RESULTS
Conformer sampling. The hydrocarbon dataset presented herein contains molecules with diverse structural elements (see Fig. 2a for 10 randomly selected examples). While the enumerated 2D molecular graphs contain information on molecular bonding, they do not fully determine the molecular geometry, e.g. with respect to relative configurations around rotatable single bonds. As an example, 115,888 (53,046) of the contained molecules incorporate at least 2 (4) rotatable bonds, with a maximum of 5 rotatable bonds occuring overall. We thus expect a significant conformational flexibility for these molecules.
This flexibility can influence the ML predictions of λ in two ways. First, the reference lambda values may de-pend on the conformer, and flexible molecules display much larger conformational variety. Second, the ML pre-diction of λ is based on a representation derived from a 3D molecular geometry. For highly flexible molecules, we can expect significantly larger deviations between the geometries predicted with more approximate levels of theory and high-level references. This is known to impact the accuracy of ML models adversely. 56 To arrive at an internally consistent procedure when comparing among different molecular systems, we therefore focus on the lowest energy conformers that we can identify for each molecular system.
Unfortunately, a full conformer search at the DFT level is prohibitively expensive. This means that we require a robust and efficient protocol for the search of low-energy conformers. To this end we rely on semiempirical and force-field methods from the GFN family, which have recently been established for this purpose. These are used in combination with CREST, which implements a purpose-built workflow for conformational search. 41 Depending on the underlying energy function, the accuracy and computational cost of this search can vary significantly, however. We therefore tested three different workflows, denoted as conf1-3.
In our reference method (conf1), we employ CREST in combination with the density functional tight-binding FIG. 3. Effect of improved conformer searches on learning behavior. Learning curves for λDFT and λGFN1 using the conf2 and conf3 conformer search protocols. Training sets for λGFN1 (λDFT) consist of up to 9.900 (880) molecules, with 1.000 (100) unseen data points used to evaluate the predictive errors. Shaded errors indicate the standard deviation for five randomly draw training sets of each size. Note that the DFT assessment was stopped earlier due to the significantly higher computational cost of the method. method GFN1-xTB 40 . Performing conformer searches for the 10 molecules of Fig. 2a, we find that between 3 and 90 conformers are identified within the default energy window of 6 kcal/mol (260 meV) above the lowest energy one, underscoring the conformational flexibility of molecules in our dataset. For these conformer ensembles, we show the wide range of encountered λ DFT values in Fig. 2b. Importantly, there is little variation between the val ues of λ DFT calculated for the lowest-energy conformers at the GFN1-xTB and DFT level, which suggests that GFN1-xTB conformers are a reliable proxy for the true first-principles ground state geometry. Note that the excellent agreement in Fig. 2b only reflects the quality of GFN1-xTB conformers, while all reorganization energies in this subfigure were calculated at the DFT level. Unfortunately, performing the full conformer search at the GFN1-xTB level is still computationally prohibitive for hundreds of thousands of molecules, however.
Alternatively, the significantly more efficient forcefield method GFN-FF 57 can be used, and the conformer search be accelerated using the 'quick' setting in CREST (herein termed conf2). For 100 randomly selected molecules, Fig. 2c shows a comparison of λ GFN1 values for the lowest-energy conformers obtained with conf1 and conf2. While the bulk of the predictions falls within the error margins of ± 20 meV, we also find 16 outliers -marked in orange. These can be attributed to an incomplete coverage of conformational space in the conf2 ensemble and to differences in the energetic ranking between GFN1-xTB and GFN-FF.
To address the latter point, in conf3 we therefore combine the higher accuracy of GFN1-xTB and the computational speed of GFN-FF: A conformer ensemble is generated with CREST at the GFN-FF level, while a subsequent local relaxation and energetic re-ranking is carried out using GFN1-xTB. Comparing again to conf1, we see a significantly better agreement between the methods (see Fig. 2d), with 5 remaining outliers falling beyond the error margins of ± 20 meV. It should be noted, that conformer searches are in general a difficult global optimization problem, which cannot be solved deterministically in an efficient manner. Therefore, some amount of uncertainty is unavoidable and will affect the ML models in all cases. As discussed in the following, achieving lower uncertainty at this stage leads to significantly lower predictive errors, however. Structure-based ML models. Having established an efficient conformer search workflow, we now turn to structure based ML models for predicting λ (K s ). As these model require 3D geometries as inputs, they are well suited to investigate the effect of the conformer search protocols on the ML models themselves, see Fig  3. Here, learning curves for λ GFN1 and λ DFT are shown. While all models improve with more data, two striking differences can be seen. First, the models using the more accurate conformer search conf3 are consistently better than the ones using conf2. Second, the predictive error is consistently lower for λ GFN1 than for λ DFT .
In part, this can be explained by the smaller range of λ GFN1 values (see next section). However, a fundamental difference between the two targets also exists: While we predict λ GFN1 on the basis of the corresponding neutral state molecular equilibrium structures, this does not hold for λ DFT . In, the latter case, the differing neutral state equilibrium geometries (between GFN1-xTB and DFT) further complicate the learning task.
It should be noted here that learning λ GFN1 is itself only of methodological interest, however. Indeed, the conf3 search requires GFN1-xTB for energy ranking, which has a similar computational effort to calculating λ GFN1 . In the following, we therefore exclusively focus on predicting λ DFT , using conf3 for structure generation. To this end we extended our DFT annotated dataset to cover in total 10.900 molecules, randomly drawn from the full hydrocarbon database. The distribution of obtained λ DFT values is shown in Fig S. 8. 1,000 molecules served as an external test set for model validation, while at maximum 9,600 of the remaining 9,900 entered the respective training sets.
Beyond structure-based models. While the above results show that λ DFT can be learned from the structure, the accuracy of the models leaves something to be desired, given that the intrinsic standard deviation of the dataset is ca. 80 meV. To explore how this performance is impacted by molecular flexibility, additional ΔK s models were trained on different subsets of 1000 molecules with a fixed number of rotatable bonds (N rb = 2,3,4,5). These models were then evaluated on test sets with the corresponding N rb (see Fig. S9). We find that models for less flexible molecules are indeed significantly more accurate than those for more flexible molecules. This confirms the notion that molecular flexibility poses a challenge for molecular ML models and underscores our previous point on the highly challenging nature of λ as a target property, e.g. compared to the atomization energy.
Since robust models already require the use of GFN1-xTB for conformer ranking, it is natural to ask whether electronic properties at the GFN1-xTB level could be used to improve them. The most straightforward way to do this is via a Δ-learning 37 strategy, i.e. by learning a correction to λ GFN1 . To this end, we first use a simple linear regression to describe systematic differences between λ DFT and λ GFN1 : This linear model alone yields a stable MAE of 40 meV, independent of the training set size. It thus out-performs the structure based K s models for all but the largest training sets (see Fig. 4). This means that, con-trary to the findings of ref. [34] we find a reasonably good correlation between GFN and DFT based reorganization energies (R 2 =0.54, see Fig. S10). This is likely due to the different class of molecules (thiophene oligomers) considered therein. Defining as a new target property: we can now build Δ-learning models that further improve on the linear approach. As expected, the Δ-learning variant of K s (termed ΔK s ) indeed performs significantly better than both the linear and the baseline model, approaching an MAE of 30 meV at the largest training set size. The GFN1-xTB calculations required for obtaining λ GFN1 can also be exploited in a different way. One challenge for the structure-based models is the indirect relationship between the neutral GFN1-xTB geometry and λ DFT . We therefore also explored property-based models (termed K p ) which use frontier orbital energies and gaps, Fermi levels, total energies and vertical energy differences of the neutral and cationic system to construct a representation, as fully detailed in Table S2. The respective ΔK p model is actually slightly better than the corresponding structure-based model ΔK s , despite not including any structural information. Finally, a combined model incorporating the structural and property kernels (termed ΔK sp ), performs better still, reaching an MAE of 25 meV at the largest training set size. Please note that no optimization of the feature selection was performed for the property based models, other than checking that there were no strong linear dependencies between different properties. However, a more systematic feature selection procedure can provide physical insight and potentially improve the models. To explore this, we performed permutational feature importance (PFI) analysis for the ΔK p model (see Fig. S11). 59 This indicates that some features are particularly relevant for the model, e.g. the HOMO energy of the cationic state in the neutral geometry, the Fermi energy of the neutral state in the cation geometry and the individual contributions to the GFN1 reorganization energy. Based on this, we constructed additional models which only used subsets of the most important features. However, these sparse models displayed somewhat worse perfor-mance than the full model, indicating that all features ultimately contribute to the prediction accuracy. Nonetheless, more sophisticated feature engineering (e.g. using recursive selection or nonlinear transformations) may be able to achieve better performance with sparse models.
ML-assisted virtual screening. So far, we have seen that in a Δ-ML setting, the presented GPR models can lead to a modest increase in predictive performance relative to a semiempirical baseline method. This raises the question of whether this improvement has a tangible effect on the results of a HTVS for low-λ DFT molecules. To address this issue, we applied ΔK s , ΔK sp (each trained on 9,600 molecules) and GFN1-xTB to screen 120,910 previously unseen molecules for promising candidates. For each model, we extracted 500 candidates with the lowest predicted λ and calculated their actual λ DFT values.
As illustrated in Fig 5a, all three methods are quite successful in identifying promising candidates: From the 500 selected systems, GFN1-xTB identifies 436 molecules that display λ DFT < 200 meV, compared to the somewhat higher numbers for the ΔK s and the ΔK sp models (where 487 and 492 are respectively identified). Narrowing the range to λ DFT < 140 meV, the ΔK sp still performs best and identifies 251 structures, while the ΔK s and the GFN1-xTB identify 217 and 118 such cases, respectively.
The 20 lowest-λ structures from all three screenings are shown in Fig 6. Interestingly, 15 compounds in this subset were identified by the GFN1-xTB screening, while the ΔK s and ΔK sp models identified 9 and 11, falling slightly behind. In other words, the GFN1-xTB model actually has an edge over the ML model when considering the extreme low end of the distribution, although it is in general less effective in identifying low-λ structures. It is also notable that, although some overlap between the methods is observed (i.e. from the 1,500 molecules selected by the three screenings only 1,131 are unique candidates), many structures are exclusively identified by one method, in particular by GFN1-xTB. This is illustrated by the Kernel Principal Component Analysis map 58  GFN1-xTB model overall exhibits the highest diversity, while the candidates selected by the data-driven models appear somewhat more concentrated. This reflects the fact that GPR models use metrics of molecular similarity in their predictions.
However, this is not primarily just a problem of the chosen models, since other ML approaches also (implicitly) work with feature similarity. It is rather that ML models are by definition most strongly influenced by those types of molecules which occur most frequently in the dataset. The HTVS setting does not necessarily re-quire a good description of an average molecule, however. Instead, it requires a good description of the small percentage of unusual molecules that we are interested in. This implies that a non-uniform sampling strategy for training set construction might be helpful in this con-text. This will be explored in future work.
At the suggestion of a reviewer, the virtual screening was also performed with the ΔK p approach (see Figs. S12-13). This model shows comparable performance to ΔK s for systems with λ< 140 meV, but is considerably worse for the range 140 meV <λ< 200 meV. This indicates that the structural information in ΔK s and ΔK sp helps the models to reliably identify systems that are structurally similar to low-λ training set molecules, thus increasing their screening accuracy.
Substructure Analysis. Given a set of candidates from HTVS like the one in Fig. 6, it is natural to ask what makes these systems such good candidates. If general design rules could be obtained from this set, this would arguably be even more useful than the candidates themselves. Visual inspection indeed points to certain structural motifs that are fairly common, such as cyclopentadiene moieties and acetylene-bridged aromatic rings.
A more quantitative understanding of this can be obtained from a substructure analysis. To this end, we analysed whether certain structural motifs are significantly more likely to be found in the low-λ subset than in the full dataset. This can be quantified via the enrichment of a given substructure, defined as where n i,low and n i,all are the number of times substructure i is found in the low-λ and full datasets, while N low and N all are the total number of molecules in each dataset. We complement this metric with the frequency of a given substructure in the dataset, defined as To obtain a general design rule, we search for substructures with both high enrichment and reasonably high frequency. This allows balancing between overly specific substructures that only occur in very few molecules to begin with (high enrichment/low frequency) and overly simple motifs that occur in many molecules, independent of λ (low enrichment/high frequency). As a preliminary screening, potential substructures were defined via Morgan-fingerprints 60 of different bondradii (see Fig. 8). As illustrated in Fig. S14, this revealed a number of highly enriched substructures, which confirmed the initial impression that acetylene-bridged and cyclopentadiene containing structures are highly favourable. However, the substructures obtained in this fashion are often redundant and chemically unintuitive (i.e. by only containing parts of aromatic rings). We therefore manually derived a number of reasonable sub- structures from this analysis, in order to elucidate a robust and general design rule for low-λ molecules (see Fig. 7). Here, we focused on acetylene-bridged benzene rings, as cyclopenatdiene is prone to dimerize in Diels-Alder reactions, pointing to potential stability issues with these molecules.
In Fig. 7a, we plot the enrichment and frequency of each substructure. This reveals a contravening trend: The simplest structure (1) is very common in the full dataset, but also displays very low enrichment in the lowλ set. In contrast, the more elaborate structures (8) and (9) are highly enriched, but very rare overall. Meanwhile substructure (5) (two meta-substituted acetylene-bridged benzene rings) features a quite high enrichment and is also fairly common in the database. As a consequence, ten further molecules with this motif can be found in the previously computed set of 10,900 λ DFT -values. This allows us to confirm that the corresponding molecules indeed display significantly lower reorganization energies than the full training set (Fig. 7c).
The distributions of λ DFT -values for all substructures are shown in Fig. 7d. This confirms the impression obtained from the enrichment plots. Simple substructures like (1) are generally unspecific and can be found in both high-and low-λ molecules. Meanwhile, highly enriched substructures indeed robustly predict high quality candidates, and can thus be used to define general design rules.
It should be noted that the above analysis is ultimately limited by the biases of the underlying dataset. For example, heteroatomic substituents could affect the suitability of certain motifs quite strongly due to electronic push-pull effects, which are largely absent in the hydrocarbon dataset used herein. Nonetheless, the methodology we apply could of course also be applied to other datasets.

IV. CONCLUSION
In this work we have explored the potential benefits of using ML models to enhance virtual screening stud-ies for molecules with low reorganization energies λ.We find that this is a challenging setting for molecular ML, both because of the conformational flexibility of the stud-ied hydrocarbon molecules and the intrinsic difficulty of predicting λ from the equilibrium geometry alone. Both aspects can be mitigated by using a semiempirical electronic structure method for conformer searching and as a baseline model (provided there is at least a moderate correlation with the target property).
While this leads to a significant improvement of the predictive performance compared to the baseline, we find that the benefits of this are actually somewhat marginal in the context of virtual screening. Specifically, ML enhanced screening is more effective in identifying promising candidates, but the semiempirical model actually has some advantages in terms of candidate diversity. This calls into question whether the cost of building the ML models (in particular the generation of training data) is actually justified. In particular, computing λ DFT for a single molecule takes on average 28 CPU hours on our hardware. In contrast, the generation of conformer ensembles (ca. 1 CPU hour per molecule) and the training of the ML models (one-time cost of 20 CPU hours for the largest training sets) are reasonably affordable. To obtain a clear advantage, more accurate and/or data-efficient ML models are thus required.
One way to achieve this would be to work with full conformer ensembles rather than single conformers to construct the representations. 61 It should also be noted that packing and contact effects occurring in molecular crystals or amorphous structures are known to influence the encountered solid-state conformation and flexibility for geometrical relaxation. 26,62,63 Potentially, generative ML models trained on condensed phase data could therefore help producing more realistic conformer ensembles.

Supporting Information
Details on structure generation, electronic properties, hyperparameters and substructure analysis.