Analysis of Whole Genome Sequences and Homology Modelling of a 3C Like Peptidase 1 and a Non-Structural Protein of the Novel Coronavirus COVID-19 Shows Protein 2 Ligand Interaction with an Aza-Peptide and a Noncovalent Lead Inhibitor with Possible 3 Antiviral Properties

11 The family of viruses belonging to Coronaviridae mainly consist of virulent pathogens that have a zoonotic property, Severe 12 Respiratory Syndrome (SARS-CoV) and Middle East Respiratory Syndrome (MERS-CoV) of this family have emerged before 13 and now the Novel COVID-19 has emerged in China. Characterization of spike glycoproteins, polyproteins and other viral 14 proteins from viruses are important for vaccine development. Homology modelling of these proteins with known templates 15 offers the opportunity to discover ligand binding sites and possible antiviral properties of these protein ligand complexes. Any 16 information emerging from these protein models can be used for vaccine development. In this study we did a complete 17 bioinformatic analysis, sequence alignment, comparison of multiple sequences and homology modelling of the Novel COVID- 18 19 whole genome sequences, the spike protein and the polyproteins for homology with known proteins, we also analysed 19 receptor binding sites in these models for possible vaccine development. Our results showed that the tertiary structure of the 20 polyprotein isolate COVID-19 _HKU-SZ-001_2020 had 98.94 percent identity with SARS-Coronavirus NSP12 bound to 21 NSP7 and NSP8 co-factors. Our results indicate that a part of the viral genome (residues 254 to 13480 in Frame 2 with 4409 22 amino acids) of the Novel COVID-19 virus isolate Wuhan-Hu-1 (Genbank Accession Number MN908947.3) when modelled (King and Finley 2014; Schauer et al., 2019). Our results show that Aza-Peptide 301 Epoxide an irreversible protease inhibitor and GRL0617 a viral replication inhibitor can be 302 used to develop inhibitors of the Novel Coronavirus COVID-19.

More than a decade has passed since the emergence human Coronavirus that caused Severe 32 Respiratory Syndrome (SARS-CoV) and it is about 7 years since the emergence of another and π-Stacking was done with PLIP server (Salentin et al., 2015) 131

132
The physico-chemical properties and primary structure parameters of the 7 polyproteins RdRp 133 region of the SARS-CoV-2 virus isolate is given in Table 1. RdRP forms an important part of 134 the viral genome where in the RNA viruses its function is to catalyze the synthesis of the RNA 135 strand complementary to a given RNA template.

136
The isolates SI200040-SP orf1ab polyprotein and the isolate SI200121-SP orf1ab polyprotein 137 had 2 reading frames as compared to the rest of the isolates which had 3 reading frames. The 138 presence of multiple reading frames suggests the possibility of overlapping genes as seen in 139 many virus and prokaryotes and mitochondrial genomes. This could affect how the proteins 140 are made. The number of amino acid residues in all the polyproteins were the same expect one 141 isolate SI200040-SP which had one amino acid more than the other polyproteins. The 142 extinction coefficients of the two isolates SI200040-SP orf1ab polyprotein and the isolate 143 SI200121-SP orf1ab polyprotein was much higher compared to the rest of the polyproteins.

144
The extinction coefficient is important when studying protein-protein and protein-ligand 145 interactions. The instability index of these two isolates was also high when compared to the 146 others indicating the that these two isolates are instable. Regulation of gene expression by 147 polyprotein processing is known in viruses and this is seen in many viruses that are human 148 pathogens (Yost et al 2013).

149
The isolates here like many other viruses may be using replication strategy which could involve 150 the translation of a large polyprotein with subsequent cleavage by viral proteases. The two 151 isolates SI200040-SP orf1ab polyprotein and the isolate SI200121-SP orf1ab polyprotein also 152 showed shorter half-lives as compared to the other isolates indicating that they are susceptible 153 to enzymatic degradation.
The tertiary structure analysis of the isolate SARS-CoV-2 _HKU-SZ-001_2020 ORF1ab 155 polyprotein is given in having the function of forming hexadecameric complexes and also act as processivity clamp 161 for RNA polymerase and primase (Fehr et al., 2016). This structure as in SARS CoV here in 162 SARS-CoV-2 may be involved in the machinery of core RNA synthesis and can be a template 163 for exploring antiviral properties. 164 The phylogenetic tree of the seven polyproteins is shown in Fig.2. It is seen that two The polyprotein also has an identity of 19.74 percent with an ABC-type uncharacterized shown in Fig. 4 and Fig. 5 respectively.

203
The statistics of structural comparison with PDB templates is given in Table 5, it is seen that 204 the proteins from the SARS-CoV-2 are significantly close to the proteins of SARS CoV and 205 the amino acid alignment in the biding region is the same in both the viruses.

239
The model with template 3e9s of the PDB database shows that the Coronavirus viral protein 240 can have a ligand which is a papain-like protease (PLpro) that is known to be a potent inhibitor 241 of viral replication in SARS (Ratia et al 2008).

242
The two parts of the Main protein from the whole genome of the SARS-CoV-2 aligned with 243 two SAR proteins and the ligand binding sites were similar, the alignment positions, number 244 of amino acids and ligand and the interacting residues is given in  CoV-2 is the same (     Human to human transmission on this virus has been a concern and due to this search for Here in this study we did a complete bioinformatic analysis, sequence alignment, comparison of multiple sequences of the SARS-CoV-2 whole genome sequences, the Spike protein and the polyproteins for homology with known spike proteins and also analysed receptor binding sites for possible vaccine development.

Materials and Methods
Six complete viral genome sequences, seven polyproteins (RdRp region) and seven glycoproteins available on NCBI portal on 4 Feb 2020 were taken for analysis. The sequence details and GenBank accession numbers are listed in Supplementary Table 1 Structural information is extracted from the template, sequence alignment is used to define insertions and deletions.
Protein ligand interaction profile with hydrogen bonding, hydrophobic interactions, salt bridges and π-Stacking was done with PLIP server (Salentin et al., 2015)

Results and Discussion
The physico-chemical properties and primary structure parameters of the 7 polyproteins RdRp region of the SARS-CoV-2 virus isolate is given in Table 1. RdRP forms an important part of the viral genome where in the RNA viruses its function is to catalyze the synthesis of the RNA strand complementary to a given RNA template.
The isolates SI200040-SP orf1ab polyprotein and the isolate SI200121-SP orf1ab polyprotein had 2 reading frames as compared to the rest of the isolates which had 3 reading frames. The presence of multiple reading frames suggests the possibility of overlapping genes as seen in many virus and prokaryotes and mitochondrial genomes. This could affect how the proteins are made. The number of amino acid residues in all the polyproteins were the same expect one isolate SI200040-SP which had one amino acid more than the other polyproteins. The extinction coefficients of the two isolates SI200040-SP orf1ab polyprotein and the isolate SI200121-SP orf1ab polyprotein was much higher compared to the rest of the polyproteins.
The extinction coefficient is important when studying protein-protein and protein-ligand interactions. The instability index of these two isolates was also high when compared to the others indicating the that these two isolates are instable. Regulation of gene expression by polyprotein processing is known in viruses and this is seen in many viruses that are human pathogens (Yost et al 2013). The isolates here like many other viruses may be using replication strategy which could involve the translation of a large polyprotein with subsequent cleavage by viral proteases.
The two isolates SI200040-SP orf1ab polyprotein and the isolate SI200121-SP orf1ab polyprotein also showed shorter half-lives as compared to the other isolates indicating that they are susceptible to enzymatic degradation.
The tertiary structure analysis of the isolate SARS-CoV-2 _HKU-SZ-001_2020 ORF1ab polyprotein is given in  The homology models of the 4409 amino acid residues of the whole genome of the SARS-CoV-2 virus isolate Wuhan-Hu-1 with the ligand association with templates 2a5i and 3e9s are shown in Fig. 4 and Fig. 5 respectively.
The statistics of structural comparison with PDB templates is given in Table 5, it is seen that the proteins from the SARS-CoV-2 are significantly close to the proteins of SARS CoV and the amino acid alignment in the biding region is the same in both the viruses.
The alignment of the 305 residues from 3268-3573 aa of the Novel Coronavirus COVI-19 with the template 2a5i is shown in Fig.6 and the alignment of the 315 residues from 1568-1882 aa of the Novel Coronavirus COVI-19 with the template 3e9s is shown in Fig.7. The two parts of the Main protein from the whole genome of the SARS-CoV-2 aligned with two SAR proteins and the ligand binding sites were similar, the alignment positions, number of amino acids and ligand and the interacting residues is given in Table 3 10  Table 2, when comparing both it is seen that the binding properties are the same expect for the presence of water bridge in the template 2a5i.
The Comparison of Hydrophobic interaction, hydrogen bonding, π-Stacking of the constructed model of the Novel Coronavirus protein from region 1568-1882 aa to ligand Small molecule Noncovalent Lead Inhibitor with the Hydrophobic interaction, hydrogen bonding, π-Stacking of the template 3e9s is given in Suppl. Table 3, when comparing both it is seen that the binding properties are the same except or an additional π-Stacking at Tyr in the template 2a5i. This shows that there is high possibility of binding of these antiviral compounds with the regions of Novel Coronavirus protein that is in homology with the SARS protein. Comparison of the hydrophobic interaction of the biding of the ligand AZP between the SARS-CoV-2 protein and the template 2a5i of SARS CoV is shown in Fig.11 and the comparison of the hydrophobic interaction of the biding of the ligand AZP between the SARS-CoV-2 protein and the template 3e9s of SARS CoV is shown in Fig.12. It is seen that the interaction is the same in both proteins with the same amino acids participating in the interaction indicating that there is a possibility that these ligands with antiviral properties can bind to the new virus. 2 has homology with this and the binding sites for this in the structural protein of the SARS-CoV-2 is the same (Table 4). This compound inhibits the enzyme that is required for the cleavage of the viral protein from the virus in SARS CoV, it also cleaves ubiquitin and has a structural homology with the Deubiquitinases (