New Code for DNA Nucleotide Sequences
2020-03-19T10:42:24Z (GMT) by
DNA nucleotides consist of the complementary base pairs of Adenine-Thymine (A-T) and Cytosine-Guanine (C-G) that encode as a sequence for genes, and encode for an upstream initiation site that enables transcription. Recently, this lab has shown that steroid hormones are structurally symmetric with each of the four DNA nucleotide pairs and through an ionic binding process may enable gene transcription. Here, a new code is developed for DNA nucleotide sequences that relates to the initiation site for gene transcription. The structural code consists of the orientation of steroid molecules in binding to DNA nucleotides and the class of steroid molecules that form an intermolecular hydrogen bond with an available functional group of Thymine. This later class thereby describes a steroid hormone-DNA nucleotide-ion complex with three hydrogen bonds for A-T and T-A, which thereby matches the three internal hydrogen bonds associated with C-G and G-C. The code consists of two binary vectors to characterize the four configurations of DNA nucleotides and is shown to be consistent with known regulatory elements of DNA sequences associated with gene transcription, including the TATA box and the E-Box, along with other promoters. In addition, the code, which is bijective, is applied to analyze the DNA sequence associated with SARS-CoV-2 to identify regions with relevant structural characteristics.