Affordable Ab Initio Path Integral for Thermodynamic Properties via Molecular Dynamics Simulations Using Semiempirical Reference Potential

Path integral molecular dynamics (PIMD) is becoming a routinely applied method for the incorporation of the nuclear quantum effect in computer simulations. However, direct PIMD simulations at an ab initio level of theory are formidably expensive. Using the protonated 1,8-bis(dimethylamino)naphthalene molecule as an example, we show in this work that the computational expense for the intra-molecular proton transfer between the two nitrogen atoms can be remarkably reduced by implementing the idea of reference-potential methods. The simulation time can be easily extended to a scale of nanosecond while maintaining the accuracy on an ab initio level of theory for thermodynamic properties. In addition, the post-processing can be carried out in parallel on massive computer nodes. A 545-fold reduction in the total CPU time can be achieved in this way as compared to a direct PIMD simulation at the same ab initio level of theory.

fer between the two nitrogen atoms can be remarkably reduced by implementing the idea of reference-potential methods. The simulation time can be easily extended to a scale of nanosecond while maintaining the accuracy on an ab initio level of theory for thermodynamic properties. In addition, the post-processing can be carried out in parallel on massive computer nodes. A 545-fold reduction in the total CPU time can be achieved in this way as compared to a direct PIMD simulation at the same ab initio level of theory.

Introduction
Hybrid QM/MM is now the method of choice for the studies of enzymatic reactions and chemical reactions in the condensed phase. [1][2][3][4][5][6][7][8][9][10][11][12][13] However, the application of QM/MM methods is often plagued with extremely poor computational scaling of the ab initio QM methods, as well as the long time scales of molecular dynamics propagation that are usually required before any essential dynamic processes for the degrees-of-freedom (DoF) orthogonal to the boosted one can be observed. 14 Furthermore, exploring the nuclear quantum effect (NQE), such as tunneling, nuclear delocalization phenomena, and zero-point energy of molecules, gains in popularity over the years. [15][16][17][18][19][20][21][22] One of the most appealing approaches to incorporating NQE is path integral molecular dynamics (PIMD). 23 PIMD is based on the isomorphism to approximately transform the representation of quantum effect in single-particle systems into a classical system with discrete finite number of pseudo-particles (or beads) along a cyclic path. 24 Being one of the most commonly used simulation methods, PIMD can be used to obtain thermostatistical equilibrium properties and dynamic properties, with free energy profiles of reactions and reaction rate coefficients as two representative examples. [25][26][27][28] However, applications of PIMD simulations in the studies of chemical reactions of complex systems are extremely demanding. On the one hand, PIMD requires P times more computational time than that of the classical molecular dynamics (MD), where P is the number of beads, onto which each quantum particle is mapped. P determines the convergence of PIMD simulations, and it needs to satisfy P >hω max /k B T , where ω max is the maximal vibrational frequency of the system. P surges as ω max increases or T decreases. It is thus computational expensive to use PIMD to study a reaction in a sufficiently long timescale at an acceptable accuracy with convergence-ensured number of beads. To reduce the computational cost, some progresses have been achieved. Tuckerman et al combined a noncanonical transformation on the quadratic part of the action with multiple time scale integration techniques, and proposed a very efficient algorithm. 29 Markland and Manolopoulos proposed an approach to reduce the computational effort by separating the short-range interaction from the long-range one. 30 Cheng et al proposed the multiple-timestep molecular dynamics (MTS-MD) algorithm to accelerate the propagation by dividing the force into slowly oscillating part and quickly varying component, 31 but the former factor is still the efficiency bottleneck. Large integration time step can also be made possible by applying proper transformations, so the analytic integration over the high-frequency modes is allowed. 32 After the transformation, the number of the beads can be reduced via ring-polymer contraction, in which some highest normal modes can be ignored. [33][34][35] Marsalek and Markland proposed a ring polymer contraction approach that can dramatically reduce the computational cost of PIMD at an ab initio level. 36 On the other hand, the accessible simulation time is further limited when it is necessary to use high-level electronic structure methods such as post-Hartree-Fock theories and density functional theory (DFT). Approximated levels of theory, such as semi-empirical (SE) methods, can improve computational efficiency and allow sufficient exploration in the phase space for systems in condensed phase. However, these methods are crude in the treatment of electronic structures, which may lead to inexact results for certain physical properties.
In addition, ergodicity in the configurational space is critical for the convergence of the

Theory
The partition function of a system with N distinguishable particles is defined as, 49 where β = 1/k B T , N is the total normalization function, x represents the position of particles, and p the momenta conjugate to x. The isomorphic primitive Hamiltonian operator under the constraint x i+1 n = x i n . This quantum Hamiltonian operator is made up of P group of beads (also known as replicas or time-slices), and each forms a classical representation of the original system. Beads of each atom are connected by harmonic potentials. In this way, a P -membered classical "necklace" model is constructed, and the simulation of quantum systems is transformed into an expanded classical calculation. m n is the mass of the particle, U is the potential energy, m i n and p i n are the fictitious mass of the ith bead of the nth particle and its corresponding momentum. This fictitious mass can be set to an arbitrary value, because this term disappears in the thermodynamical statistical average. The ω p = √ P /βh is the harmonic constraint connecting the two most adjacent beads. For the ideal quantum problem, the full quantum HamiltonianĤ is established when P → ∞. The introduction of this isomorphic relationship also means the quantum mechanical sampling can be mapped onto classical simulations by either Monte Carlo (MC) 50,51 or MD 52 methods. The potential of mean force with a collection of PIMD samples can thus be written as To accelerate computation, we incorporate of the PIMD and the reference-potential method (RPM) for the first time to calculate the statistical quantum mechanical properties at a level-of-interest Hamiltonian while simulating at a lower level Hamiltonian state.
Additionally, we adopt the US method for enhancing the sampling. From K simulations, we can collect N k configurations from the kth simulation. Each simulation is characterized by its specific potential function U k , for k ∈ 1 . . . , K. In the umbrella sampling method, where U 0 (r) is the unbiased Hamiltonian and W k (r) is the biasing potential acting on some chosen collective variables (CV) η(r). The unnormalized weight of each configuration under another Hamiltonian U t is , (5) in which f k is the free energy of state k and is calculated by iteratively solving the Multistate Bennett Acceptance Ratio (MBAR) 53 equations Here, N = K k=1 N k is total number of snapshots collected in all the simulations, and . ω t (r n ) can be understood as the weighting factor for each configuration in a thermodynamic perturbation from the mixed reference potentials (U k ) to the target Hamiltonian (U t ). Therefore, this method is referred to as the multistate thermodynamic perturbation (MsTP). Thermodynamic properties that depend only on atomic coordinates under U t can thus be obtained as If A is an indication function δ of some chosen CV ξ(r) the potential of mean force (PMF) under U t (r) can be written as defined up to an additive constant. Similarly, the PMF under U 0 (r) can be written as where is the unnormalized weight for configuration r n under U 0 (r). To ameliorate the numerical instability originated from incomplete sampling, the weights ω t (r n ) under the target Hamiltonian are scaled by Gaussian-smoothing on the density-of-states. 47 In this work, we take U 0 as the PM6/MM 54 Hamiltonian, and U t as the BLYP-D3/6-31G(d)/MM 55-57 level of theory. For simplicity, they will be labeled as the SQM and DFT level of theories. Please also note that η is unnecessarily ξ, but we set them equal in this work.

Simulations
One DMANH molecule, shown in Fig. 1, 58,59 was solvated in a TIP3P water 60 sphere with a radius of 25Å using the LEaP module in the AmberTools19 package. 61 One chlorine ion was added for neutralization. The QM region contained only the DMANH molecule, and the MM region comprised all the solvent molecules and the counter ion. In the MM region, the SHAKE algorithm 62 was applied to constrain all the bonds involving hydrogen atoms.
The system was energy-minimized for 2000 steps and then heated up to 298 K in 100 ps.
Finally, a 100-ps classical MD simulation was conducted for further relaxation of the system.

Results and Discussion
Reweighting entropy (RE), is a measure of the reliability of the RPM. We have previously shown that when RE is greater than 0.3 in the MsTP calculations, the extrapolation process is reliable. 45 When it is below 0.3, large biases may reside in the free energy profile. As shown in Fig. 2, the free energy profile under the DFT level is reliable when the CV is smaller than 0.8Å. When the CV is greater than 0.8Å, RE plunges. This is the consequence of the large difference in the preferred structures under these two Hamiltonians, i.e. SQM and DFT. Fig. 3   of the sampling inefficiency when the MBAR weights are assigned to the samples. As shown in Eq. 5, the weight decays exponentially as the energy difference between the two Hamiltonians increases. Precisely, ∆U t is the difference in the deviations of the potential energies from their respective means, since the difference in the means can be canceled in normalization.

Shown in
Under the SQM level, the minimum in the free energy profile locates at CV ≈ 0.6Å, which is about 0.1Å larger than the minimum under the DFT level. This indicates that in the optimal structure under the SQM level, the proton prefers to be closer to one of the nitrogen atoms, while under the DFT level the proton is more diffusive. This can be explained by the