Toward full ab initio modeling of soot formation in a nanoreactor

A neural network (NN)-based model is proposed to construct the potential energy surface of soot formation. Our NN-based model is proven to possess good scalability of O(N) and retain the ab initio accuracy, which allows the investigation of the entire evolution of soot particles with tens of nm from an atomic perspective. A series of NNbased molecular dynamics (NNMD) simulations are performed using a nanoreactor scheme to investigate critical processes in soot formation, acetylene polymerization, and inception of PAH radicals. This shows that NNMD can capture the dynamic process of acetylene polymerization into PAH precursors. The simulation of PAH radicals reveals that physical interaction enhances chemical nucleation, and such enhancement is observed for clusters of πand σ-radicals, which is distinct from the dimer. We also observed that PAH radicals of ~ 400 Da can produce core-shell soot particles at a flame temperature, with a disordered core and outer shell of stacked PAHs, suggesting a potential physically stabilized soot inception mechanism.


Introduction
Soot emitted from human activities leads to respiratory diseases and global warming [1]. A significant number of studies are ongoing to understand the mechanisms responsible for soot formation, which involves precursor chemistry, particle nucleation, and mass/size growth [2,3]. The critical stage of soot formation is the inception process, in which gas-phase aromatics form condensed clusters, resulting in carbonaceous nanoparticles.
Polycyclic aromatic hydrocarbons (PAHs) are widely accepted as precursors for soot. Two major pathways of soot nucleation initiated from PAHs [2,4] are known, namely, physical nucleation of PAHs into stacked clusters [5,6] and chemical nucleation of PAHs into an aromatic network via cross-linking reactions [7][8][9]. Physical nucleation of PAHs is only possible for large PAHs (>667 Da), but these species fail to be observed in experimental studies due to their low concentrations [10]. Chemical nucleation is a process where reactive sites on PAHs are chemically combined to form a stable structure. These reactive sites can be connected by acetylene, which refers to the well-known hydrogen abstraction acetylene addition (HACA) growth mechanism [9]. Another chemical pathway is the formation of a π-bonding network where π-radicals directly react with PAH σ-radicals without the requirement of hydrogen abstraction [11]. These πradicals can be stabilized due to delocalization, also known as long-lived resonantly stabilized radicals (RSRs). The existence of PAH π-radicals in sooting flames has been reported by high-resolution atomic force microscopy (HR-AFM) and photoionization mass spectrometry experiments [11,12]. The reactivity of different sites on a range of PAH radicals was examined by Martin et al. [13]. They found that σradicals are the most reactive, forming bonds with other radicals. Partially saturated rim-based pentagonal rings were found to form localized π-radicals with high reactivities. The forward rate constants of these radicals are calculated by Menon et al. [14]. The maximum rate was 10 12 cm 3 mol -1 s -1 for aryl-radical recombination, which is still insufficient to trigger the nucleation flux for nanoparticle formation [15].
Physical nucleation is rapid but too weak to be stable at flame temperatures, while most chemical nucleation rates are slow. Frenklach and Mebel [15] recently suggested an enhancement of polyaromatic polymerization induced by internal rotors. These rotors were argued to provide multiple opportunities for an aryl radical to attach to the reactive edge of a PAH, such as a rim-based pentagonal ring. A dimerization reaction of π-diradicals was reported to confirm that the physical interaction between PAHs could accelerate chemical nucleation [16]. Such a mechanism, known as physically stabilized soot inception, shows a feasible pathway to explain the fast growth of soot. However, most previous work focuses on the dimerization process. The dynamic evolution between multiple reactive species is still not well understood.
The complexity of the chemical and physical reactions during soot nucleation makes it both numerically and experimentally challenging. Ab initio molecular dynamics (AIMD) simulations have been widely applied to reveal the atomic insights of complex reactive systems based on fundamental equations of quantum and classical mechanics. Although AIMD has been successfully used for polyyne inception [17] and PAH reactive dimerization [16], AIMD calculations of large PAH molecules/radicals are still challenging due to the high computational cost despite impressive progress in computing hardware and software in recent decades. Over the past decade, many empirical potentials (or force fields) have been developed to mimic electronic structure calculations' potential energy surface (PES). Such empirical potentials, including ReaxFF [18,19] and REBO [20], trade accuracy for a lower computational expense, making it possible to extend simulation scales to orders of magnitude beyond AIMD methods. ReaxFF is a bond order-based force field that describes reactive systems without a priori knowledge of the predefined reactive sites. It has been a powerful tool to study kinetic mechanisms for systems of large molecules and complex reactions. Recently, machine learning-based tools, especially neural networks (NNs), have been applied to construct PES models in an entirely datadriven manner, where the PES is abstracted from a well-selected training dataset using suitable functional expressions automatically [21]. NN models constitute a very flexible class of mathematical functions, which enable the development of PES models with the efficiency of the empirical potentials and the accuracy of the DFT method. Different NN-based PES models have been proposed for materials and biomolecules [22,23]. Zeng et al. [24] implemented an NN-based model to reveal the mechanisms of methane combustion.
In addition to developing PES models, many other methods have been developed to accelerate AIMD calculations. The "nanoreactor" is a recently introduced AIMD-accelerating simulation method [25]. The rate of reactions can be highly accelerated with a virtual piston periodically pushing molecules toward the center of the simulation. Several new pathways for glycine synthesis from primitive compounds proposed to exist on early Earth have been successfully revealed [26]. Nanoreactors would provide new insights into complicated chemical processes and discover elementary steps suitable for complex systems such as soot.
Here, we develop an NN-based model to explore the soot formation mechanism with ab initio accuracy. The NN-based model is first trained and validated against the DFT database with wide-ranging precursors involved in the soot formation process. Molecular dynamics simulations are performed to reveal the soot inception mechanism of PAH radicals using a nanoreactor scheme. The products are analyzed to understand the effects of different radicals on soot structures.

Development of the neural network potential
The NN-based potential was constructed following the Deep Potential (DP) scheme recently developed by Zhang et al. [22], which can accurately reproduce the interatomic forces and energies predicted by ab initio calculations in condensed matter systems. Figure 1 shows the illustration of DP scheme. The PES is represented by a deep neural network model that interprets the atomic coordination (R) into interatomic forces (F) and energies (E). The deep neural network contains a filter (embedding) network with three layers (25,50, 100 nodes/layer) and a fitting net with three layers (240 nodes/layer). The loss function (L) is defined as, where pe and pf are the weight for the energy and force terms, respectively. N represents the number of atoms in the structure. Similar to a classical neural network, the DP scheme trains the model by computing the gradient of the loss function using the backpropagation algorithm [27]. The NN model is trained for 1.0 × 10 6 iterations with an exponentially decaying learning rate from 1.0 × 10 -3 to 5.0× 10 -8 . The performance of NN potential highly depends on the quality of the training dataset [28,29]. Although NNs are good at interpolating between training points, they cannot predict the energies and forces of configurations distant from those of the training set. Therefore, it is critical to ensure that the dataset covers the PES of interest. The formation of molecular soot precursors starts from the fuel pyrolysis process, where acetylene and propargyl are the key intermediates. These intermediates would further react to form the first aromatic ring -benzene. Subsequent reactions lead to the formation of larger PAHs via the HACA mechanism. This paper constructs an ab initio database to cover the entire evolution from acetylene to large PAH molecules. In Fig. 2, there are 47 subsystems in the database, including alkanes, alkenes, alkynes, gaseous radicals, PAH molecules, PAH radicals, and carbon materials. Each subsystem consists of MD trajectories of a specific molecule. Details about the molecular configuration of each system are listed in Table S1. The trajectories are generated from ReaxFF MD simulation under an NVT ensemble at temperatures of 300, 1000, 2000, 3000, 4000, 5000, and 6000 K.
To obtain accurate energies and forces, 1000 configurations are randomly selected from a 40-ps ReaxFF MD simulation, and high-level DFT calculations are further conducted. DFT calculations were performed using the CP2K package [30]. Core electrons are treated using Goedecker−Teter−Hutter (GTH) pseudopotentials and the Perdew Burke Ernzerhof generalized gradient approximation method [31,32]. The Grimme DFT-D3 method [33] is used to account for dispersion interactions. A double-zeta Gaussian basis set plus polarization (DZVP-MOLOPT) [34] is considered. In addition to MD simulations, configurations are also obtained using active learning sampling as implemented in the DP-GEN package [35] with a temperature range from 300 to 4000 K. With the above method, an ab initio database with energy and force information is constructed to train the NN potential.

Nanoreactor based MD simulations
The soot formation process is investigated with our NN potential. Two kinds of systems are studied: acetylene for their polymerization and the formation of aromatic rings and PAH π-radicals and σ-radicals for physically stabilized soot inception. Figure 3 shows the structure of the PAH radicals studied in this work. PAH-a, -c, and -e are σ-radicals involved in soot inception, while PAH-b, -d, -f, and -g are π-radicals reported in simulations and experiments [12,16]. Before MD simulations, monomers of acetylene and PAH radicals were optimized using the DFT method with the B3LYP/6-311G(d,p) level of theory. The optimized monomers were duplicated 1000 times for acetylene and 50 for PAH radicals and then randomly distributed in a cubic box. The equations of motion are integrated by the velocity Verlet method using periodic boundary conditions. A Nose-Hoover thermostat is applied with an equilibrium temperature of 1500 K and a dump parameter of 20 fs. A one ns MD simulation with a time step of 0.l fs is performed for both acetylene and PAH radical systems. The nanoreactor method is used to accelerate the simulation, where a virtual piston in a sphere is imposed on the system during MD simulations. The virtual piston is defined by a time-dependent boundary potential [25]: where k=0.01 kcal mol -1 Å, τ = 1.0 ps, T = 2.0 ps.r0 is the sphere radius, m is the atomic mass, ⌊ ⌋ is the floor function and θ is the Heaviside step function. To minimize the temperature fluctuations in nanoreactors, the compression process is divided into three stages. The boundary potential oscillates between a large value of r1, a median value of r2, and a small value of r3. The r values are determined by the system density. We use three values of 0.1, 0.4 and 0.7 g/cm 3 to calculate the corresponding r1, r2, and r3 values. Detailed model settings are listed in Table S2.

Validation of NN-model
The performance of the NN potential is tested against the ab initio database. Figure 4 shows the prediction of DFT energies and forces using the NN potential trained by the DP scheme. The results predicted from the ReaxFF forcefield [36] are also listed for comparison. Detailed mean absolute errors (MAEs) are listed in Table S2. We found that the MAE values increase with the system temperature in MD sampling. The DP model shows a lower error than the ReaxFF model (0.67 v.s. 1.18 kcal/mol/atom), suggesting the good fitting ability of the DP scheme. Surprisingly, we find that the DP model shows excellent performance for force prediction, while the ReaxFF model seriously underestimates the interatomic forces (38.62 kcal/mol/Å). This can be explained by the parameterization of ReaxFF, as only energy information is taken as the training targets. By considering the forces in loss function (Eq. 1), the DP model significantly improves the model prediction and, therefore, accurately captures the dynamic evolution of the soot formation process. We also test the computational cost of DP and ReaxFF potentials on C2H2 systems with 20 to 160,000 atoms. The NNMD simulations of the DP model are performed with one NVIDIA V100 GPU, and the ReaxFF MD simulations are on a 64-processor server with two AMD EPYC 7452 CPUs. Figure 5 shows that our DP model follows an O(N) scaling rule, while ReaxFF exhibits an O(NlogN) scaling rule. The better scaling performance of the DP model enables the exploration of a system with tens of thousands or even millions of atoms at ab initio accuracy, providing a possibility to investigate the entire evolution of soot particles from an atomic perspective. Fig. 5. The computational cost of DP (red square) and ReaxFF (blue circle) models on C 2 H 2 systems with 20 to 160,000 atoms. The system lengths are from 1 to 28 nm. The insert shows snapshots of the corresponding systems.

Acetylene polymerization into PAH
First, we analyze the polymerization of acetylene, which corresponds to the formation of soot precursor -PAH molecules. The reaction pathway from acetylene to benzene is shown in Fig. 6a. The polymerization starts with the collision and addition reaction between two acetylene molecules at 1.5 ps to generate an HC≡CH-CH≡CH molecule, which further grows into a long chain molecule (e.g., HC≡CH-CH≡CH-CH≡CH) with the addition of another acetylene molecule at 3.6 ps. This species further forms a benzene via an isomerization reaction at 6.4 ps, and this pathway is consistent with a previous work [37]. Figure 6b shows the pathway from benzene to the 2 nd aromatic ring. The hydrogen atom in benzene is abstracted by other radicals to generate a reactive site. Then, a CCH radical attacks the reactive site at 11.1 ps to form a benzene with an acetylene chain, and this structure is expected to be stable at flame temperatures [38]. At 12.4 ps, the second addition reaction occurs near the first acetylene chain and forms the 2 nd aromatic ring after long isomerization (~5.1 ps). In Fig. 6c, we also observe the formation of a large PAH with four aromatic rings at 101.7 ps, which follows similar acetylene addition and isomerization pathways. The dynamic evolution of the main species during the polymerization process is shown in Fig. 7. Since there are a large number of isomers for CxHy species, the products are grouped by their carbon atom numbers. It shows that the C2 group is rapidly consumed within the first 10 ps. The insert in Fig. 7a illustrates the main species of each group, which is extracted from MD trajectories using the ReacNetGenerator package [39]. The main composition includes acetylene, but there are also hydrogen transfer and elimination reactions to produce CCH2 and CCH radicals. C4 and C6 groups are the main products of C2 polymerization in the first 20 ps. In Fig. 7b, the C7-10 group represents the growth of the 2 nd ring, which reaches a maximum at ~ 30 ps and starts to decrease. Groups with larger carbon atom numbers follow the same trends but at a slow rate. For example, the time needed to reach the maximum number of molecules was ~40 ps for the C11-15 group and ~200 ps for the C15-20 group. In the above simulation, the growth of large PAHs from small acetylene is observed, proving the feasibility of studying soot formation using our NN model. Fig. 7. Evolution of species during the simulation for (a) C 2 , C 4 , C 6 , (b) C 7-10 , C 11-15 , C 15-20 , and C 20+ groups. The inserts in (a) show the main species for the C 2 , C 4 , and C 6 groups.

Physically stabilized soot inception of PAH radicals
The inception process of PAH radicals in Fig. 5 is investigated using a nanoreactor scheme, where a sphere boundary potential with periodic changes in radius is imposed on the PAH radicals to accelerate the collision and reaction process. Seven PAH species are considered (Fig. 3). For each type of PAH, the atom number in the maximum cluster is calculated and normalized by PAH size for comparison. Figure 8a shows the normalized max cluster size of PAH-b andf in the first 40 ps. In Fig. 8b, the reactor radius (r) shows a variation with the time-dependent rectangular waveform. The period of each cycle is 4 ps. It starts with a relaxing stage of r=24.8 Å within the first 2 ps, followed by two compression stages with r=15.6 Å and r=13.0 Å for 1 ps each. Later, a new relaxing and compression cycle continues until simulations end. A reversible inception process is observed in the nanoreactor. During the compression stage, the reactor is compressed into a small volume (1/7 of the volume in the relaxing stage). PAH radicals collide with each other to form PAH clusters via physical interactions or chemical bonds. The reactor expands out with cluster decomposition in the next relaxation stage. The evolutions of cluster size are different for PAH-b and PAH-f. Cluster sizes in PAH-b fluctuate with the variation of reactor radius and remain a small value after 10 cycles (e.g., 40 ps). In contrast, the cluster for PAH-f gradually grows to a size of 44, although a decomposition stage is also observed in each relaxing stage. The clustering behaviors of PAH-b are somehow unexpected, as PAH-b is reported [13] to form a crosslinked dimer. However, within the simulation time in the nanoreactor, it cannot further grow into large clusters. The evolution of cluster numbers (Ncluster) and normalized maximum cluster size (Scluster) is listed in Fig. 9. To inhibit fluctuations induced by the compression and relaxing process, only the maximum of Ncluster and the minimum of Scluster in each relaxing stage are plotted, which represent the actual clustering state. For small radicals (<200 Da), such as PAH-a and PAH-b, the number of clusters decreases to ~40 at 100 ps, while the maximum cluster size remains lower than 5. As the physical interaction is proportional to molecule size, it is expected that the inception of these radicals is only induced by a chemical bond. The fraction of reactive collisions (Freac) for these PAH radicals has been systematically examined by Martin et al. [16], which indicates the possibility of forming covalent bonds between two PAH radical monomers in a collision. PAH-a has a high fraction of reactive collisions (e.g., 0.023) among these PAH radicals. However, chemical bonding does not generate a large number of clusters within the simulation time. For medium-sized radicals (200-400 Da), such as PAH-c and -d, cluster sizes continue to grow at a flame temperature (1500 K). These radicals are much smaller than the minimum requirement for physical inception (e.g., 667 Da [10]), indicating the feasibility of physically stabilized soot inception. For large radicals (>400 Da), such as PAH-e, -f, and -g, the growth rates of cluster size are faster than those for medium-sized PAH radicals due to the strong physical interaction. The fraction of reactive collisions has a relation of PAH-g (Freac=0.024) > PAH-f (Freac=0.013) > PAH-e (Freac=0.008) but shows the same clustering trends (see Fig. 9b), suggesting that physical interactions play a more important role than chemical bonds for large radicals. Chemical nucleation is also observed in these radicals, but their rates are much lower than those of physical nucleation, which agrees with the calculations by Menon et al. [14]. In addition, we also note that the chemical nucleation in the large cluster is slightly different from that in the dimer. Since the radicals in the dimer are always in a parallel stacked structure, Martin et al [16]. reported that σ-radical dimers have weaker physical enhancement than π-radicals because the C-C bond prefers to be collinear with the aromatic planes. However, there are multiple stacks in a large cluster, which enable chemical nucleation between σ-radicals in the same plane. For example, chemical cross-links between different stacks of σ-radicals are observed in PAH-e clusters. In Fig. 10a, we summarize the physical enhancement of the nucleation mechanism of dimers and large clusters. In a dimer, PAH radicals are stacked in parallel due to physical interactions, which can accelerate the cross-linking between π-radicals but does not affect σ-radicals. In a large cluster, there are multiple stacks, and σ-radicals can form cross-links with adjacent stacks. Figure 11 includes the final products collected from different precursors in the nanoreactors, which are mapped according to the precursor properties. These products are postprocessed using the TEMSIM package [40] to obtain simulated TEM images, which can be directly compared to the structures seen in HRTEM experiments [41,42]. For PAH-a and PAH-b, the PAHs are arranged in a gaseous form. Most radicals form cross-linked dimers or trimmers, and no large cluster is observed. For PAH-c and PAH-d, a prominent cluster is produced in the box center. The structure of PAH-c is disordered and loosely packed, similar to the covalently bound incipient particle predicted by Johansson et al. [11]. For PAH-d, a coreshell-like structure is observed, where the core region shows disordered and cross-linked structures, and the shell region (upper right part) is ordered and closepacked stacks. For large radicals, such as PAH-e, -f, and -g, the final products are spherical stacked structures with cross-links between different layers. These results suggest that the morphology of the final products is highly related to the size of the precursors. At flame temperatures, no PAH cluster can form when the molecular mass of precursors is lower than 200 Da. This critical mass is lower than the value of 666 Da for physical nucleation of PAH molecules simulated by Mao et al. [36]. This can be attributed to the higher reactivity of PAH radicals, which can form cross-links and lead to chemical nucleation. For PAH molecules without reactive sites, a higher temperature (2500 K) is required for chemical nucleation, which is too high for typical sooting flames. As the size of the precursors increases, the product particle changes from a disordered cluster into an ordered and close-packed structure. It is interesting to note that the product of PAH-d has a core-shell structure, which includes a disordered core and outer shell of stacked PAHs [41,42]. Such structures are also reported in HRTEM experiments [41,42]. Another piece of evidence is from a recent mass spectrum experiment [43], which shows that PAHs of 239-838 Da are the main component of soot particles, suggesting that physically stabilized soot inception of PAH radicals larger than 200 Da is a potential inception mechanism.
In summary, the inception mechanism of PAH radicals can be divided into three groups: chemical, "physical+chemical", and physical, as shown in Fig. 10. When the radical molecular mass is lower than 200 Da, the physical interaction is too weak at flame temperature, and chemical nucleation is the dominant mechanism. With increasing radical mass, the physical interaction becomes stronger, and a stack cluster structure can be formed with the combined effect of physical and chemical nucleation. For large radicals (>400), the physical interaction becomes the dominant nucleation mechanism.

Conclusions
In this work, we develop a neural network (NN)based model to explore the soot formation mechanism with ab initio accuracy. NN-based molecular dynamics simulations (NNMDs) of acetylene and PAH radicals of different sizes are performed using a nanoreactor scheme to investigate the mechanism of soot inception. The product structures of PAH radicals are analyzed to interpret the potential nucleation mechanism for different sized radicals.
Our NN model considers both energy and force information from high-level DFT calculations, which has been proven to have significantly higher accuracy than the ReaxFF model. In particular, the ReaxFF model is shown to underestimate the atomic force of DFT calculations, which affects the model accuracy when dealing with the reaction dynamics process. Our NN model also indicates that good scaling performance follows O(N), providing the possibility to investigate the entire evolution of soot particles from an atomic perspective.
The inception mechanism of PAH radicals is investigated with our NN model. We find that the physical interaction enhances chemical nucleation, and such enhancement is observed for clusters of π-and σradicals, which is distinct from the dimer. It is also found that PAH radicals can produce soot particles at flame temperatures (1500 K), and the particle morphology is strongly affected by PAH size. For PAH radicals of 374 Da, a core-shell soot particle with a disordered core and outer shell of stacked PAHs is observed, and similar structures are also observed in HRTEM experiments. These results suggest that physically stabilized soot inception of PAH radicals larger than 200 Da is a potential nucleation mechanism.  Table S1. The ab initio database configurations; Table S2. Nanoreactor configurations for PAH radicals; Table S3. Model prediction for the DP model and the ReaxFF model against DFT calculations.