Computational investigation on natural quinazoline alkaloids as potential inhibitors of the main protease ( Mpro ) of SARS _ CoV _ 2

Drug discovery is still behind in the race compared to vaccine discovery in fighting COVID-19. Recently, a few alkaloids from a traditional Indian medicinal plant, Vasaka (Justicia adhatoda), have been linked computationally to the main protease (Mpro) of SARS_CoV_2. To expand the knowledge and for further investigation, we have selected 41 quinazoline alkaloids from two natural product databases to create an adequate library and performed detailed computational studies against the main protease (Mpro) of SARS_CoV_2. The screening of the library was carried out through blending the rigid docking and pharmacokinetic analysis that resulted in nine alkaloids as initial leads against Mpro. These nine alkaloids were further subjected to advance flexible docking using first reference famotidine for the analysis of structure-based interactions. For further selection, a second screening was carried out based on binding energies and interaction profiles that yielded three alkaloids namely CNP0416047, 3hydroxy anisotine and anisotine as hits. The stereo-electronic features of hit alkaloids were further investigated through additional structure-based E-pharmacophore mapping against a second reference, known X77 ligand. Additionally, the reactivity of hit alkaloids at the binding site of the protein was estimated by measuring the electron distribution on the frontier molecular orbitals and HOMO-LUMO band energies. Finally, the stabilities of complexes between hit alkaloids with the protein were accessed extensively using robust molecular dynamics simulation through RMSD, RMSF, Rg, and MM-PBSA calculation. Thus, this study identifies three natural quinazoline alkaloids as potential inhibitors of MPro through extensive computational analysis.


Combined screening protocol based on rigid docking and pharmacokinetic analysis:
For rigid docking, advanced and widely used molecular grid-based docking program Autodock4.2 (22) was executed for screening purposes. Prior to docking, protein structure was edited by adding hydrogen atoms and gasteiger charges in autodocktools to create PDBQT files. The energy minimized ligand structures were further edited in autodocktools (ADT) by defining root center, aromatic carbons, and torsions. In grid selections, the whole receptor structure was selected for autogrid calculation with 0.375 Å spacing surrounding all possible binding sites to perform blind docking. Finally, docking studies were carried out using the Lamarckian Genetic Algorithm (LGA) based on the grid maps of all atoms present in the receptors and ligands. Parameters were set to an initial population of 150 randomly placed individuals, a maximum of 2.5x10 7 energy evaluations, cluster tolerance of 2 Å (rms), output level 0 and maximum generations of 2.7x10 4 numbers for total 10 numbers of conformations. After each docking execution, the 10 docked conformers were clustered by autodocktools based on thermodynamic parameters. The compounds exhibiting negative binding energy ≥ 9.8 Kcal / mol were chosen from this screening.
Total nine alkaloids, selected from rigid docking, were subjected for the evaluation of pharmacokinetic features and toxicity features using SwissADME server (23), pkCSM-pharmacokinetics server (24), and BIOVIA Discovery Studio ADMET descriptor (25). The ADME properties of the compounds were measured on SwissADME based on Lipinski's rule, and were further checked using BIOVIA Discovery Studio ADMET predictors. The possible toxicities of these compounds were estimated by pkCSMpharmacokinetics tool. The final screening was carried out based on ADME features, non-carcinogenic features and cut-off binding energy.

Flexible Docking of the screened compounds for structure-based assessment of binding
Flexible docking between SARS_CoV_2 M Pro with screened alkaloids and reference compound famotidine were performed using CDOCKER program of BIOVIA Discovery studio (25) for detailed structure-based assessment of binding. The retrieved protein structure from RCSB database was initially subjected to 'clean protein' protocol to eradicate alternate conformations, and any error in bond order and amino acid sequence. Thereafter, the protein was further subjected to protonation at pH 7.4 and ionic strength 0.145 (M) respectively. In addition, the CHARMm force field was applied in vacuo for the global energy minimization of the protein structure. For the targeted docking to the active site of the protein, site sphere was created with co-ordinates set to -12.0702, 13.4652, and 68.368 respectively. Finally, the CDOCKER program was executed with customized parameters where the pose cluster radius was set to 0.5, total random conformations were set to 10, total dynamics steps were set to 1000. Additionally, in the simulated annealing section heating steps and target temperature were set to 2000 and 700 K simultaneously, cooling steps and target temperature was 5000 and 300 K simultaneously. After each docking, ten docking conformers were ranked automatically based on -CDOCKER energy, and evaluated further for molecular-level analysis of the bindings.

HOMO-LUMO measurement by density functional theory calculation:
The geometry of the alkaloid molecules were optimized in gas phase using B3LYP functional as implemented in Gaussian 09 software (26) using the 6-311+G(d,p) basis set. The geometry optimization of the molecules was also done using another hybrid density functional with empirical dispersion correction (ωB97XD) (27) with same basis set for verification. The energies of the Highest Occupied Molecular Orbital (HOMO) and Lowest Unoccupied Molecular Orbital (LUMO) and the electron density distributions of these frontier orbitals were captured by using GAUSS-VIEW software. The HOMO-LUMO energy gap has been linked to reactivity of a molecule and extrapolated to the activity of the bound inhibitor inside the enzyme catalytic cavity (28).
2.5. E-pharmacophore modeling: E-pharmacophore model was built using the recently published cocrystal structure of X77 inhibitor with protein M Pro (PDB ID 6W63). For this purpose, the co-crystal structure was retrieved from RCSB database, then 'interactive pharmacophore model' was generated in BIOVIA Discovery studio by selecting the protein as receptor and the X77 inhibitor as ligand. The protocol was set to calculate maximum ten pharmacophores with maximum features 6 at minimum inter-feature distance 2 with steric volume excluded for all types of non-bond interactions. The vacant E-pharmacophore was then mapped with the energy-optimized structures of selected alkaloids based on the common features. Prior to mapping, alkaloids were structurally optimized using CHARMM force field with Momary-Rone partial charge at pH 7.4 and all possible tautomers, stereoisomers and protonation states were generated.

Molecular Dynamics Simulation:
All the molecular dynamics simulations were performed using standard dynamics cascade protocol of BIOVIA Discovery studio. The top docked poses of all the M Proinhibitor complexes were further edited by protonating at pH 7.4 at dielectric constant of 10 and subjected to CHARMM36 force field. Each system was placed in the centre of the orthorhombic simulation box modeled by the explicit periodic boundary solvation model. A minimum distance from the edge of the box was set to 7 Å and 0.145 (M) NaCl was added to each of the systems. The protein only (M Pro ), protein-reference complex (M Pro -famotdine), and three final protein-inhibitor complexes (M Pro -CNP0416047, M Pro -3-hydroxyanisotine, M Pro -anisotine) were solvated with 6358, 6335, 6332, 6313, 6325 no of water molecules in the orthorhombic simulation box. All the solvated systems were minimized in two stages to sort out poor contacts within the system and to warrant the systems to gain a low energy point as required for the successive stages. Then the solvated systems were subjected to 2000 steps of the steepest descent algorithm in the first step followed by the 5000 steps of the conjugate gradient. Additionally, each of the minimized systems was slowly heated from 50 K to 300 K at a time period of 120 ns. Then the systems were equilibrated for 1 ns at a constant temperature of 300 K. In both the heating and equilibration steps, the adjusted velocity frequency was set to 50. The production step was performed for 20 ns for each of the systems using NVT ensemble at 300 K. The results of the production steps were saved after every 2 ps. During the simulation, the non-bond higher cut-off distance and lower cut-off distance were kept at 12 Å and 10 Å respectively, and the spherical cut-off method was used for the electrostatics calculation.

Molecular mechanics-Poisson-Boltzmann surface area (MM PBSA) method:
To calculate the free energy of binding and to score the inhibitors based on it, the molecular mechanics-Poisson-Boltzmann surface area (MM PBSA) method was carried out after simulation using the BIOVIA Discovery studio. Purposefully, the trajectories of the simulations were processed before the MM PBSA calculation and the trajectories of last 5 ns simulation data with frame increment of 250 was chosen for MM PBSA calculation. The MM PBSA calculation of free energy of binding is a slowest solvent approximation method based on continuum electrostatics and a summation of the calculation of potential energy, polar and non-polar solvation energy components and depends on the free energy of complex, ligand, and receptor as following as (29,30):  (Figure 1). Additionally, each Domain II and III are connected by a Loop (amino acids 185 -200) that is crucial for the enzymatic action of main proteases (6). The substrate-binding site and catalytic dyad (Cys145, His41) is located at a cleft between Domain I and Domain II. Furthermore, the substrate binding site is divided into four sub-sites: S1 located nearby Cys145, S1' located nearby His41, S2 located nearby hydrophobic residues namely Tyr54, and S4 located nearby the Loop region. Moreover, we have selected famotidine as one of the reference compounds as it's linked to M Pro inhibition computationally (31,32,33,34) and also implicated to SARS_CoV_2 growth inhibition (35). Additionally, we have chosen X77 molecule as a second reference in E-pharmacophore modeling as X77 is a non-covalent inhibitor and the co-crystal structure of X77-M Pro is available (PDB ID 6W63) (36,37).

Figure 1:
The X-ray crustal structure (PDB ID 6LU7) of the main protease (M Pro ) of SARS_CoV_2, the protein is displayed as cartoon diagram showing the domain I (in red color), domain II (in green color), domain III (in yellow color), and loop (in magenta color). The active site cavity remains at the cleft between domain I and II, the catalytic dyad (Cys145 and His41) are displayed in orange color, the other crucial residues namely Met45, Tyr54, Phe140, His163, Met165, Asp166, Phe185, and Glu189 are shown as sticks. The subsites (S1, S1', S2, S4) are shown by arrow marks.

Screening of library:
Inspired by the initial reports of natural compounds showing promising results against M Pro (15,16), we designed a computational strategy for the detailed investigation of quinazoline pharmacophore against the main protease of SARS_CoV_2. Purposefully, we constructed an adequate library of 41 natural quinazoline alkaloids ( Figure 2) from two natural product databases, COCONUT (17) and NPBS (18), for a full-scale investigation. The initial screening of 41 compounds against our target M Pro was a necessary task and achieved by blending rigid molecular docking and pharmacokinetic analysis (Figure 3). In brief, the entire library of compounds along with reference famotidine was subjected to grid-based rigid docking by Autodock4.2 using blind docking protocol where the entire protein was selected in grid mapping covering all possible binding sites. It was observed that all the alkaloids bind at the substrate binding cavity of the main protease and interact with the loop crucial for the enzymatic mechanism of the target (6). Comparing the binding energy and inhibition constant found from docking studies, we observed that alkaloids having an aromatic moeity at C3 of tetrahydropyrrole ring and at C6 of quinazoline frame displayed better result. For the screening purpose, the alkaloids were ranked according to their negative binding energy as obtained from the rigid docking as shown in Table 1. An initial cut-off was employed and nine compounds were selected having negative binding energy ≥ 9.8 Kcal/mol for the second screening based on pharmacokinetic analysis. These nine compounds were further evaluated for their pharmacokinetic and toxicity features using SwissADME server (23), pkCSM-pharmacokinetics server (24), and BIOVIA Discovery Studio ADMET predictors (25). The ADME properties of the compounds were measured on SwissADME based on Lipinski's rule (Supporting Table 1), and were further checked using BIOVIA Discovery Studio ADMET predictors (Supporting Figure 1). Additionally, the possible toxicities of these compounds were estimated by pkCSM-pharmacokinetics tool. The molecular weight of all the quinazoline alkaloids ranges between 335.36 to 427.45 indicating the efficient absorption, transportation and diffusion in the body. Moreover, the TPSA values of the 9 alkaloids were found to be in the range between the 70 -90 Å 2 indicating good intestinal absorption and cell permeability. Overall, the initial screened alkaloids showed good bio-availabilities based on Lipinski's rule and toxicity prediction.

Advanced flexible docking to establish the structure based interactions:
The initially selected nine alkaloids from the screening were further subjected to the advanced flexible docking against SARS_CoV_2 M Pro along with the reference famotidine to assess the structure based interactions. Purposefully, the CDOCKER program of BIOVIA Discovery studio (25) equipped with flexible docking at variable temperatures using CHARMm based energy function was utilized for the detailed structure-based assessment of binding. As these alkaloids were found to be catalytic cavity binders in rigid docking as shown in screening section, therefore the flexible dockings were carried out inside the active site cavity sphere encompassing all subsites and crucial residues. The alkaloids were docked separately using customized protocol as described in materials and method section, and were ranked according to their best negative CDOCKER Interaction energies as shown in Table 2. Furthermore, the interactions of all alkaloids with the crucial residues present at enzyme active site cavities were examined carefully in terms of classical hydrogen bond interaction, non-classical hydrogen bond interaction, hydrophobic interaction, and electrostatic interaction ( Table 3, the 2D interaction diagrams all alkaloids with the protein M Pro were given in Supporting Figure 2). All the alkaloids and the reference compound famotidine were found to be involved in multiple points interactions with the residues present in M Pro active sites. Interestingly, the data in Table 2 clearly depicts that alkaloids named CNP0368366, CNP0416047, 3-hydroxy anisotine, and Anisotine showed better or almost equal interaction energies compared to reference famotidine. Although the Anisotine was previously linked to M Pro binding (15,16), the other three alkaloids namely CNP0368366, CNP0416047 and 3-hydroxy anisotine are entirely new leads against M Pro . Encouraged by these findings, we have chosen CNP0416047, 3-hydroxy anisotine, and anisotine as hit compounds and further examined their interaction profiles along with reference famotidine in details (Figure 4 and Figure 5). Regretfully, we neglected CNP0368366 from further examinations as it showed probable carcinogenicity in ADMET prediction (Supporting Table 1). The reference famotidine docked between the domains I and II and spanned the area between subsite S1, S1' and S4 by interacting with Asn142 (2.71Å), His163 (2.25Å), Phe140 (2.29Å), Met165(2.80Å), Arg188 (2.50Å), Glu166 (2.20Å) through conventional hydrogen bond, with Asp187 (2.75Å), Arg188 (2.75Å) via non-classical hydrogen bonds, and with active site residues His41 (4.74Å), Cys145 (5.37Å) through long-distance electrostatic interaction, but failed to provide any strong hydrogen bonding interaction with catalytic residues. The alkaloid hit CNP0416047 positioned similarly inside the catalytic cavity and showed strong conventional hydrogen bond interaction with active site residue CYS145 (2.66Å) and conserved neighbouring residue Asn142 (2.06Å), and displayed non-classical hydrogen bond and hydrophobic interactions with multiple residues interactions with Leu141 (2.58Å), His164 (3.05Å), Glu166 (2.99Å), Met165 (4.93Å), Met165 (4.93Å), Leu141 (4.80Å) along with catalytic residue His41 (4.80Å). In addition to spanning S1, S1' and S4 subsites, alkaloid CNP0416047 also covered S2 subsites through exhibiting hydrophobic interactions via its tyrosine moiety linked to C6 of quinazoline ring. The next alkaloids 3-hydroxy anisotine and anisotine showed similar interactions and covered the S1, S1' and S4 subsites of catalytic cavity. Both 3-hydroxy anisotine and anisotine exhibited strong conventional and non-classical hydrogen bonding interactions with residues present near S1 subsite such as His163, Ser144, Met165, Glu166, Phe140, and showed hydrophobic interactions with catalytic residues Cys145 and His41. Compared to anisotine, the 3-hydroxy anisotine showed one extra conventional hydrogen

E-pharmacophore mapping of the hits:
E-pharmacophore hypothesis has recently emerged as a useful tool in computer-aided drug discovery and structure-based screening (38,39,40). In Epharmacophore hypothesis, the combination of stereo-electronic features and energetics of a known bound ligand within the complex of its' biological target is measured computationally and later mapped with new ligands. In this study, we used E-pharmacophore model to compare the stereo-electronic features and energetics of our hit alkaloids within M Pro binding site using the complex of the protein with another known non-covalent inhibitor, X77 (N-(4-tert-butylphenyl)-N-[(1R)-2-(cyclohexylamino)-2-oxo-1-(pyridin-3-yl)ethyl]-1H imidazole-4-carboxamide) (36,37). In order to do so, the crystal structure of M Pro bound with non-covalent X77 inhibitor was retrieved from RCSB database and processed as mentioned in the materials and method section to generate the energyoptimized E-pharmacophore model. As a result, total ten pharmacophores were generated having a total 12 non-bond, 4 hydrophobic and 7 hydrogen bond interactions and we selected the pharmacophore having the highest features (AAHHaromHaromHarom where A: Hydrogen bond acceptor, H: Hydrophobic, Harom: Hydrophobic aromatic respectively) for mapping our hit alkaloids (Figure 6). The chosen pharmacophore was then screened with alkaloid hits CNP0416047, 3-hydroxy anisotine, and anisotine respectively as one to one mapping based on a cut-off of minimum 4 features. Prior to the mapping, all three alkaloids were structurally optimized using CHARMm force field with Momary-Rone partial charge at pH 7.4 and all possible tautomers, stereoisomers and protonation states were generated. It was found that all three selected alkaloid hits were mapped to '5 features hits' indicating the hit alkaloids displayed the 5 out of 6 features in the mapping with respect to the pharmacophore.
Furthermore, the mapping was scored based on a 'fit value' and anisotine displayed the best fit value of 1.50884, whereas CNP0416047 mapped with lowest fit value of 0.644164 with respect to the pharmacophore. Figure 6: E-pharmacophore model and mapping of alkaloid hits. A. Generated E-pharmacophore model of X77 inhibitor bound inside M Pro ; B. five features best hit mapping of CNP0416047 against the pharmacophore; C. Five features best hit mapping of 3-hydroxy anisotine against the pharmacophore; D. Five features best hit mapping of anisotine against the pharmacophore.

HOMO-LUMO calculation of the hits using density functional theory:
DFT based calculation was carried out for our selected hits CNP0416047, 3-hydroxy anisotine, anisotine and for reference famotidine in order to examine the electron distribution on the frontier molecular orbitals and to calculate the HOMO-LUMO band gaps. The HOMO-LUMO energy gap has been linked to reactivity of a molecule and extrapolated to the activity of the bound inhibitor inside the enzyme catalytic cavity (28). The calculations were carried out using two different methods, B3LYP/6-311+G(D,P) ( Table 4) and w-B97XD/6-311+G(D,P) (Supporting Table 2), and the independent findings of both calculations supported each other. As shown in Table 4, the HOMO-LUMO bandgaps of CNP0416047, 3-hydroxy anisotine and anisotine are comparable to the band gap of reference famotidine predicting that HOMO of inhibitor may smoothly transfer the electrons to the LUMO while interacting with amino acid residues at M Pro binding site. Moreover, the electron distribution at the frontier molecular orbital diagram of the hits corroborated with the findings from docking studies (Figure 7). The HOMO-LUMO structures of both 3-hydroxy anisotine and anisotine displayed high electron density at the aromatic moiety attached to 3-position of tetrahydropyrrole ring showing that these aromatic moieties can exhibit strong interaction with protein residues, which supported the screening results as the alkaloids having similar structures exhibited better interaction profiles in rigid docking. Additionally, the HOMO structure of CNP0416047 has electrons localized on quinazoline frame whereas the LUMO structure has electrons on tyrosine moiety attached to C6 of quinazoline ring. From the HOMO and LUMO structures, it can be concluded that the quinazoline frame of CNP0416047 is available to donate electron to the interacting residues of the protein and in agreement with the docking results as explained in previous section.

Molecular dynamics simulation of the hits:
The hit alkaloids CNP0416047, 3-hydroxy anisotine, and anisotine were subjected to molecular dynamics simulation along with reference famotidine to examine the conformational stability and compactness of the their complexes with the target. Purposefully, the unligated protein (only M Pro ) and protein-hit complexes (M Pro -famotidine, M Pro -CNP0416047, M Pro -3-hydroxy anisotine, and M Pro -anisotine) were simulated for 20 ns using BIOVIA Discovery Studio-Standard Dynamics Cascade program using the customized protocol as mentioned in materials and method section. In each simulation run, trajectories of 10,000 conformations were analyzed and several properties like RMSD, RMSF, radius of gyration, and binding energy were explored.
The conformational stability was evaluated by RMSD analysis by calculating the RMSD data of alpha carbon atoms of the protein backbone for all the protein-inhibitor complexes and the unligated protein with respect to the initial structures. During MD simulation the system was heated to 300 K for 120 ps at first, then equilibrated for 1 ns and then went through production phase for 20 ns beginning at 1.12.ns. The average RMSD value for M Pro protein was found to be 0.170 ± 0.027 nm and for the M Profamotidine, M Pro -CNP0416047, M Pro -3 hydroxy anisotine, and M Pro -anisotine complexes were found to be 0.168 ± 0.028, 0.165 ± 0.021 nm, 0.181 ± 0.0175, 0.166 ± 0.030 proving the conformational stabilities of the protein-inhibitor complexes ( Table 5). The RMSD of the native protein slightly fluctuated between 0.9 nm to 0.16 nm for the first 5 ns period of time and remained between 0.15 to 0.21 nm for the rest (Figure 8). The fluctuation of a group of atoms and the flexibilities of different regions of protein were analyzed from RMSF calculation. In case of protein only, the higher fluctuation was observed for the Glu47 residue (0.27 nm) in domain I and for Tyr154 residue (0.24 nm) in domain II, but the rest of residues in domains I and II had low fluctuation below 0.2 nm. In the case of protein-hit complexes, the conformational fluctuation was higher at the domain III as compared to the domains I and II due to binding of the hits in between domains I and II. The average RMSF value of the native protein was found to be 0.119 nm, whether the protein-inhibitor complexes displayed lower fluctuations than M Pro displaying the stabilities of the complexes after binding (Table 5, Figure 8). Moreover, the detailed analysis of RMSF fluctuations of different residues and domains (I, II, III, and loop) were carried out as shown in supplementary  Figure 3). Therefore, the calculated Rg values signify that the unligated protein became more compact in nature due to complexation with the inhibitor molecules and gained stabilities.
Moreover, in order to calculate the binding affinities of inhibitors towards the protein target, the molecular mechanics-Poisson-Boltzmann surface area (MM PBSA) method was carried out. For this purpose, trajectories of the simulations were processed and last 5 ns simulation data with frame increment of 250 were selected for MM PBSA calculation. The result of this calculation showed that the reference compound famotidine exhibited negative binding energy of 19.88 ± 0.71 Kcal/mol, whereas alkaloid hits CNP0416047, 3-hydroxy anisotine and anisotine displayed comparable negetaive binding energies of 22.71 ± 0.70 Kcal/mol, 21.77 ± 0.68 Kcal/mol, and 31.38 ± 0.72 Kcal/mol respectively ( Table 5).

Name of the complex
Average RMSD (nm)