Biological and Medicinal Chemistry

Geographical Distribution of Amino Acid Mutations in Human SARS-CoV-2 Orf1ab Poly-Proteins Compared to the Equivalent Reference Proteins from China

Authors

Abstract

The amino acid mutations among 28,345 poly-protein sequences corresponding to human SARS-CoV-2 orf1AB gene representing the six geographical locations; Africa, Asia, Europe, North America, Oceania and South America were identified by comparing with the equivalent reference poly-protein sequences derived from the first human SARS-CoV-2 genome sequence, reported from Wuhan-Hu-1, China. The mutations were analysed according to the following three datasets; i) 27,956 poly-proteins comprising 7,096 amino acid residues, ii) 373 poly-proteins comprising between 7,051-7,095 amino acid residues and iii) 16 poly-proteins comprising between 7,097-7,099 amino acid residues. In all, 3,204 distinct mutation sites were observed among the poly-proteins comprising 7,096 amino acid residues contributing to ~45% of the poly-protein sequence in SARS-CoV-2 orf1AB gene that have undergone mutations since the outbreak of COVID-19 pandemic disease in December 2019. Fifteen proteins of the poly-protein sequence were associated with mutations and the mutation propensities for the “leader protein”, nsp2, nsp3, nsp6, nsp7, nsp8, endoRNAse proteins was higher (> 1) compared to nsp4, nsp9, nsp10, 3C-like proteinase, RdRp, helicase, 3’-to-5’ exonuclease and 2’-O-ribose methyltransferase proteins. Relatively higher mutation percentages were observed for the RdRp (35.32%), nsp2 (26.42%), nsp3 (11.73%) and helicase (7.88%) proteins, whereas, mutation percentages for the remaining proteins ranged between 0.16% for nsp10 protein to 4.11% for the 3’ -to-5’ exonuclease proteins. Five mutations; T265I in nsp2 protein, T1246I in nsp3, G3278S in 3C-like proteinase, L3606F in nsp6 and P4715L in RdRp were common across all six geographical locations. The P4715L RdRp mutation was predominant in all geographical locations, except Africa, where G5215S mutation was predominant. The maximum number of distinct mutation sites were observed for the nsp3 protein. In 373 orf1AB poly-protein sequences comprising between 7,051-7,095 amino acid residues, deletion mutations were observed that were associated with “leader protein” between positions; 82-86 (GHVMV) and positions 141-143 (KSF). Among 16 orf1AB poly-proteins comprising between 7,097-7,099 amino acid residues, certain insertion mutations were observed that were associated with the nsp2 (517K), nsp3 (938E, 1901Y), 2’ -O-ribose methyltransferase (7046F) and nsp6 (3610F, 3611L) proteins. In this work, all mutations observed among the 28,345 orf1AB poly-proteins of human SARS CoV-2 relative to the reference sequences are presented.

Version notes

In the present work (Version 2.0), amino acid mutation analyses has been carried out on an enlarged dataset comprising 28,345 human SARS-CoV-2 orf1AB poly-protein sequences representing six geographical locations with respect to the reference sequence. Several new mutations observed in the enlarged dataset are reported. A catalogue of the updated set of substitution, deletion and insertion mutations observed is attached as Supplementary Data. Multiple mutation types observed at a particular mutation site are also reported. The revised catalogue serves to identify geographical location-specific and protein-specific mutations. The mutation analyses inform us of the distinct mutation sites, mutation types, mutation frequencies and common mutations observed among the poly-protein sequences across the different geographical locations in the enlarged dataset. The analyses on enlarged dataset demonstrates an increase in the size of mutations observed to ~45% in human SARS-CoV-2 orf1AB poly-protein sequences since outbreak of the COVID-19 pandemic in December 2019.

Content

Thumbnail image of CHEMRXIV-SARS-COV2-ORF1AB-MANUSCRIPT-VERSION-2.docx.pdf

Supplementary material

Thumbnail image of SUPPLEMENTARY-TABLE-1-version-2.docx.pdf
SUPPLEMENTARY-TABLE-1
Distinct mutation sites and mutation types observed in the poly-proteins of 27,956 human SARS-CoV-2 orf1AB gene comprising 7,096 amino acid residues and representing six geographical locations. The same mutation site can be associated with different mutation type(s).
Thumbnail image of SUPPLEMENTARY-TABLE-2-version-2.docx.pdf
SUPPLEMENTARY-TABLE-2
Mutation sites associated with more than one mutation type in the poly-proteins of human SARS-CoV-2 orf1AB genes from Asia, North America, Europe and Oceania.
Thumbnail image of SUPPLEMENTARY-TABLE-3-version-2.docx.pdf
SUPPLEMENTARY-TABLE-3
Mutations in human SARS-CoV-2 orf1AB poly-proteins comprising 7,051-7,095 amino acid residues that include deletion mutations (-)
Thumbnail image of SUPPLEMENTARY-TABLE-4-version-2.docx.pdf
SUPPLEMENTARY-TABLE-4
Mutations in human SARS-CoV-2 orf1AB poly-proteins comprising 7,097-7,099 amino acid residues that include insertion mutations (‘-‘).

Supplementary weblinks

COVID-19 Coronavirus pandemic status
Worldwide COVID-19 coronavirus cases, deaths, recovery status
NCBI Virus database
A community portal for viral sequence data
Phylogenetic analysis software for everyone
A web service for constructing and analyzing phylogenetic relationships between sequences
ABREAST™
A Bioinformatics, Research, Education, Services and Training Consultancy