Bioinformatics Bioengineering Perspectives

Content Type

User Generated

User

onqe1410

Subject

Science

Description

I have 5 questions, Everything is clear in the attached file

Unformatted Attachment Preview

1 BME 6762 – BIOINFORMATICS: BIOENGINEERING PERSPECTIVES Fall 2017 Instructor: Telephone: Dr. P. S. Neelakanta 561/297-3469 E-Mail: neelakan@fau.edu (EE. 96/Rm517) Fax: 561/297-2800 Assignment A FUNDAMENTALS (General reference: Chapters 1 and 2 of the prescribed text-books and Tutorial: SET I) Submission Instructions Due date: By 13 October 2017 Format: 1 Hard-copy, type-written given /posted to the Instructor by due date 1 Soft-copy: WORD document unzipped e-mailed to the Instructor th --------------------------------------------------------------------------------------------------------------------Problem # A.1 Suppose the following represent FOUR hypothetical DNA message strands as designated. In each case, write down the following: (i) Complementary DNA strand (ii) pre-RNA strand and (iii) assuming that the triplets indicated in red correspond to intron regimes, write down the possible mRNA and (iii) decode the message to find the designated amino-acid (AA) sequences in terms of standard single letter and three letter formats. Given DNA sequences: (a) (b) (c) (d) Hints: (i) (ii) (iii) 5′…CAC 5′…GGG 5′…AAC 5′…CGG GCA CCG GCA CGG TCG ACA CCG ACA AAT CAC AAT CAC CGG ATC TGG ACC TAT CTA TAT GTA AAA CTC AAG ATC GCT CTC GTT CCC CCC GAG GGG GAT TTA CTT TCA ATT ATC…3′ GAG…3′ CTC…3′ CAG…3′ Complementary strand: It corresponds to: 3’-5’ Watson-Crick pair-matched sequence mRNA refers to: With introns spliced out, the complementary strand has U replacing T. Use the Tables on triplet/amino-acids Example solution i (a) 3’…GTG CGT AGC TTA GCC ATA TTT CGA GGG AAT TAC…5’ ii (a) 3’…GUG CGU AGC UUA GCC AUA UUU CGA GGG AAU UAC…5’ iii 2 (a) 3’…GUG GCC AUA UUU CGA GGG …5’ Problem # A.2 Each of the following hypothetical strand of amino-acids denote a subsection of a tRNA. In each case, identify its associated RNA strand. (1) (2 ) (3) (4 ) 5′… 5′… 5′… 5′… Ile R O Phe Y O Met U T Glu U H Glu A W Met Q R Leu H O Pro C P Leu M R Leu U X Ser T … Ala Z … Gly Stop … 3′ 3′ Gly Stop Asp … 3′ 3′ Hints: (a) Use the Tables on triplet/amino-acids (b) Back translating protein to RNA (or DNA) is not advocated. It is difficult to inversetranslate amino-acids into RNA as there are only 20 amino acids and 64 different variations of the bases T, C, A & G. Therefore, different variations of bases (in the third wobbling position) could result in the same amino-acids. Example solution (1) 5′… Ile Met Glu Leu Leu Ser Gly Stop … 3′ 5’ … AU(U,C,A) AUG GA(A,G) CU(U,C,A,G) CU(U,C,A,G) UC(U,C,A,G) GG(U,C,A,G) UA(A,G) … 3’ Problem # A.3 The following are two pairs of hypothetical triplet sequences. Considering each pair, do the sequences in the pair represent identical message encoded (mRNA) in the context of Central Dogma. If so, justify. Pair I 5′… TTA CTT ATT GTT TCT GCT TTA ATC…3′ 5′… TTA CTG ATT GTG TCG GCG TTA ATG…3′ -------------------------Pair II 5′… TTG CTT ATT GTT TAT GCT TTA ATC…3′ 5′… ATT TCT ATT GTG CAT CGG TTA GGG…3′ Hints: Use the Tables on triplet/amino-acids and write down the AAs of the triplets given in each sequence. Hence conclude 3 Problem # A.4 When a hypothetical DNA strand is analyzed, the following are details on the counts of the bases, A, C, T and D across 10 segments (in windows of unequal residual lengths). Identify with reasons those segments that can be most likely regarded as noncodons. Segment ID I II III IV V VI VII VIII IX X Total number of residues 1020 796 1608 3002 1228 2010 1616 808 646 2022 Individual residue counts: (Approximate) in each segment A C T G 252 247 250 271 64 332 90 310 539 358 264 447 664 1108 412 818 300 319 316 293 537 483 519 471 346 553 350 367 198 225 200 185 158 165 160 163 627 269 234 892 Hint: In the noncodon section, the four residues are approximately “equally likely” to occur. Problem # A.5 Plot the associated hydropathy (index) profiles along the following amino-acid sequences: (1) 5′… Phe Met Ile Leu Thr Ser Ser His Gly Stop … 3′ (2 ) 5′… A R Q U T K W H O P R T Z … 3′ (3) 5′… Leu Thr Ile Ser Val Ser Ala His Arg Ter … 3′ (4 ) 5′… E D Q Y T N W G C I F V (*) … 3′ Hint: Use the hydropathy table on AA residues --------------------------------------------------------------------------------------------------------------------TUTORIAL (Excerpted from Neelakanta’s Book in Progress) THE CENTRAL DOGMA – WHAT IS IT? The central dogma of molecular biology refers to the principle governing the flow of generic information from DNA to RNA and finally enabling the translation of proteins. It was enunciated by Francis Crick [F.H. C. Crick: Central dogma of molecular biology. Nature, 1970, vol.227 561–563] as described by James Watson in [J. D. Watson: The Molecular Biology of the Gene. W.A. Benjamin. Inc. , New York, NY:1965] The central dogma addresses the underlying protocol on how the DNA defines the synthesis of protein by way of an RNA intermediary and passing through four major stages, namely, replication, transcription, processing, and 4 translation. “The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred from protein to either protein or nucleic acid”. This is illustrated in Figure 2.1.2 Transcribed to: DNA Messenger RNA in the cell nucleus Translated to: Protein in the cytoplasm Figure 2.1.2 Pertinent to a DNA, the central dogmatic stages defining the hierarchical steps of synthesis of protein using an RNA intermediary. “Once (sequential) information has passed into protein it cannot get out again [F. H. C. Crick: On protein synthesis. Symposium of the Society of the Experimental Biology: XII, 1958, pp.138-163]. The sequence of DNA into protein-making is the central dogma of biology illustrated in Figure 2.1.3 Transcription at cell nucleus DNA PROTEIN RNA Translation at cytoplasm Figure 2.1.3 Transcription to translation steps The flow of central dogmatic processes involved in step-by-step towards protein synthesis is further illustrated in Figure 2.1.4. Whenever proteins are needed, the corresponding genes of the DNA are first transcribed into ribonucleic acid or RNA. In RNA, the nucleotide base uracil (U) replaces the thymine present in DNA. This process of DNA gene made into RNA is known as transcription. Thus, the RNA depicts a single-strand derived from the DNA part carrying instructions out of the nucleus to places as needed throughout the cell. It thus enables a messenger RNA (mRNA) as indicated in Figure 2.1.4. Eventually, the mRNA encoded with genetic information derived via transcription process serves as a recipe on how to build a protein molecule in the ribosomal section of the cell as shown in Figure 2.1.5. The flow of genetic information from the genes determines the protein composition and thereby the functions of the cell. Proteins perform important tasks for the cell functions and serve as the basic building blocks of living systems. In the transcription stage of realizing the RNA derived from the DNA, the associated change in nucleotide chemistry is as follows: The sugar in DNA is deoxyribose, as described in 5 Appendix 2.x and in the RNA, this sugar composition turns into a ribose. Corresponding change implies that a nitrogenous base, uracil (U) is assumed in RNA in lieu of T; here, U and T are very similar bases with complementary paring (U ↔ A) in the RNA, is identical to (T ↔ A) in the DNA. The DNA is initially situated in the nucleus organized into chromosomes within the cell; and as such, each cell holds the genetic information. (The DNA gets duplicated whenever a cell divides and the process of such cell division is known as replication. The pre-mRNA stage indicated in Figure 2. 4. 4 is the result of “processing” of the unzipped complementary strand of the helical DNA structure. Processing involves the removal of non-coding parts from the unzipped complementary strand and subsequently, transportation of the chain out of the nucleus. Next, the proteins are built outside the nucleus in the cell. It is based on the code inscribed in the mRNA, which is translated into a corresponding protein complex. In short, the central dogma of molecular biology governs the flow of generic information from DNA to RNA and finally enabling the translation of proteins. It was originally defined by James Watson. He addressed the concept of how DNA defines the synthesis of protein by way of an RNA intermediary via four major stages: namely, replication, transcription, processing, and translation. This Replication implies that the DNA can self replicate in the nucleus of a cell by using one strand of the double helix as a template. (This is true for a eukaryote, an organism such as a bacterium that has no nucleus. As such, its DNA replicates in the cytoplasm of this single- celled organism). The DNA also codes for mRNA in a process called transcription. That is, the DNA codes for the production of mRNA during transcription. As mentioned before, the structure of RNA is similar to that of DNA except that RNA exists as a single stranded unit with Uracil replacing the DNA counterpart Thymine. In eukaryotic cells, the mRNA is processed and migrates from the nucleus to the cytoplasm. In summary, transcription implies: ◦ ◦ RNA synthesis within cell nucleus DNA converted to a single-cell RNA (nRNA), which is processed then to a matured RNA (mRNA) known as messenger RNA The mRNA is then transported out of the nucleus and into the cytoplasm of eukaryotes where proteins are formed through a process called translation. Messenger RNA carries coded information to ribosome; and, the ribosomes “read” this information and use it for protein synthesis. In summary, the translation process refers to: ◦ ◦ mRNA, when transported through the nucleus membrane to the cytoplasm, then it is translated into protein with the help of ribosomes The ribosomes contain a variety of different proteins and assortment of RNA molecules collectively known as ribosomal RNA (rRNA). These short-lived, but abundant rRNAs are involved in the binding of mRNA to ribosome during protein-making translation process. The process of protein-making through the central dogma of molecular biology illustrated in Figure I.1.A-w 1 involves, first the transcribing of information in sections from the DNA strand into an intermediate polymer called messenger RNA (or mRNA), which is similar to DNA except 6 that the sugar residue R is replaced with a slightly different one, namely Ribose R′; and, a base called uracil (U) replaces the thymine (T) of the DNA. Double-stranded DNA (dsDNA) ----- ----- ---G C A T -------- C A G T C G T A -------- G T C A Crick-Watson base paring in dsDNA Unzipped strand segment Gene with genetic information 5’ 3’ A A T C G T T T A G C A 3’ Pre-mRNA 5’ Transcription 5’ U U A G C A 3’ mRNA with coded information Figure 2.1.4 Transcription process relevant to a double-stranded DNA sequence 7 mRNA with coded genetic information ----- ----5’ U U A G CA Translation of mRNA to protein synthesis 3’ Ribosome Protein Figure 2.1.5 Translational process on the transcribed genetic information shown in Figure 2.1.4 The machinery of making proteins via translation involves step-by-step addition of amino acids to a growing protein chain by a ribozyme (called a ribosome). That is, when mRNA is made available at the ribosome acting as message centers situated throughout the cell, the coded information in the mRNA gets translated into a biochemical polymeric complex as decided by a protocol of operations dictating, which building blocks are needed and in what order so that, eventually an assembled composition, namely, the protein is synthesized. The above operation happens with the help of ribosomal RNA. This ribosomal ribonucleic acid (rRNA) is the RNA component of the ribosome. It is the predominant material within the ribosome with two subunits, namely, the large subunit (LSU) and small subunit (SSU). The LSU rRNA acts as a ribozyme enabling the formation of catalyzing peptide bond in synthesizing the protein. Thus, the culmination of translation yields proteins made up of several or many polypeptides. As is evident from the stages involved in protein-making, such polypeptide building blocks are chains of amino acids constituted by the triplets of the set, {A, C, U, G}; and, in essence, the protein is the culminating result of an one-dimensional sequential arrangement of amino acids (residues) bearing information consistent with the genetic code, as read from an mRNA template; and, the RNA itself is a template-copy of one of the organism's genes derived via the steps of central dogma as described above. Thus, the agenda of how the information in DNA is turned into protein is the tale of central dogma in microbiology. The amino acid participating as residues in deciding ultimately, the protein composition (as per genetic code) are triplets or codons formed by the alphabet set of the DNA namely, {A, C, T, G. Taken in triplets of the bases results in set of codons with a cardinality of sixty four (= 43). In terms of these triplets (or codons) grouped appropriately, a set of twenty distinct amino acids are prescribed. These twenty amino acids are encoded by the standard genetic code and are called proteinogenic or standard amino acids as tabulated in Appendix 2x. In summary, the stages of central dogma of microbiology indicate how the DNA is transcribed into mRNA, which is then translated into amino acids and eventually into protein. 8 Once the mRNA is made, it is trimmed down to a final size and sent to the nucleus; and, as the mRNA gets into the cytoplasm within the cell, it is framed into the desired protein. In essence, the "messenger", namely, the nucleic acid (dubbed as mRNA) carries the genetic instructions from the DNA into the cytoplasm. That is, the mRNA depicts a copy of a gene and serves the function of bridging the entities, DNA-and-protein. Thus, the gene part of the DNA transcribed into in each chromosome provides the instructions for a protein to be made. The statistical features of the order of nucleotide bases in the DNA, specifies the random order of amino acid residues in the mRNA; and, the stochastic profile of these residues sequenced in the mRNA specifies the posentropy (or information in Shannon’s sense) in deciding the protein complex synthesized. (The informatic aspects of such message-bearing entities are discussed in Chapter I.2). Subsequent to transcription, the translation process involved conforms to the mRNA being transported in to the cytoplasm where the encoded information in the gene-template is "decoded" or "translated", so as to produce the correct order of amino acids in a protein. This translation is enable by the intervention of certain enzymes listed below:  Ribosomal RNA (rRNA): These are RNA molecules representing certain proteins to form the ribosomes; and, each ribosome at a time can accept two transfer RNAs (tRNAs) and one mRNA.  The tRNAs are small RNA molecules that carry a specific amino acid at one end and an anticodon region that recognizes and binds mRNA at the other end. The tRNA that binds to that mRNA codon determines what amino acid is added to a protein chain. Thus, the three RNAs (namely, mRNA, tRNA, and rRNA) participate collectively turn the genetic information in the DNA into an eventual 3-dimestional protein. In all, the translation is the process by which the nucleotide sequence of mRNA is converted to the amino acid sequence of a polypeptide. (In prokaryotic single-celled organisms (such as, the bacteria), the translation process takes place in the cytoplasm). The steps of translation can be stated as illustrated in Figure I.1.A: DNA Transcribed to: Messenger RNA in the cell nucleus Translated to: Protein in the cytoplasm 9 DNA 5’ PrecursormRNA 3’ Transcription Translation mRNA Protein Folded protein Intron: Non-informative segment Exon: Information-bearing segment Untranslated region (UTR) . Figure I.1.A-1.x Making of a protein: The central dogma of a DNA defining the synthesis of protein using an RNA intermediary 5´ A T G G A T C A 5´ || WC matched-pairs || 3´ 3´ dsDNA T G T A C C T A A C A U G G A U RNA polymerase mRNA Figure I.1.A-1.x Pre-mRNA synthesis through transcription The genetic information contained originally in the DNA is carried forward when the bases are set in triplet forms. With the four {A, C, T, G} bases, the possible permuted triplets are 64 (= 43) and they are grouped into 20 amino acids; and, each amino acid being the triplet of bases bears the mapped genetic code of the DNA. The transcription occurs at a specific site on one strand of 10 DNA known as transcription initiation site, marked by a characteristic base sequence. The transcription proceeds through a specific chemical pairing namely, the WC-paring (A ⇔ T/U) and (G ⇔ C) mentioned earlier. The transcription process, in essence, is an information retrieval technique from the original memory units of the DNA. In the subsequent process of translation, the information contained in amino acids constraints the cell in what order that the amino acids be strung together in making of the protein constituents. The eventual (correct) translation of eukaryotic genomic data into a protein complex is, however, subject to the effects of mutations on the evolutionary conservation. Any underlying corruptions may manifest at the so-called splice junctions that separate/delineate two subsequences in a DNA sequence, namely, the (genetic) information-bearing codon segment (called an exon) and the non-informative “junk” codon, also known as non-codon or intron. Exons bear necessary information towards protein-making, whereas non-codons are noninformative and their genetic role has not been fully elucidated. Exons and introns appear randomly along the DNA sequence as shown in Fig. 2. Codons tend to be typically no more than 200 characters long, while noncodons could be tens of thousands of characters in length. Thus in majority, introns prevail mostly in a typical eukaryotic gene. Towards the process of protein-making, introns are first scissored out (in the transcription stage) from the sequence and the remaining exons are spliced together constituting the mRNA, which is rendered ready for translation into a protein complex (at the cell interior). Should any errors have occurred (due to mutations), they would give room to the possibility of evolving wrong or cryptic splice-junctions and lead to (imperfect) translations. That is, aberrant splicejunctions may result from mutational spectrum and would hamper the making of correct proteins. In short, the concept of central dogma adopted resulting in the culmination of proteinmaking can be summarized as follows: Transcription: Relevant to a dsDNA helix, a portion of it that corresponds to a gene is first unzipped to form a messenger (mRNA) as shown in Figure I.1.A-r. This unzipped part thus belongs to the one-side of the DNA, namely, the 3´- 5´ reverse strand. In the unzipped part, the nucleotide base T is replaced by, uracil (U). The resulting product is the pre-mRNA. Of its contents, namely, the exons and introns, the introns are removed and the exons are spliced together to constitute a mature mRNA, which leaves the nucleus to be transcribed by the ribosome. The sequence of DNA, which encodes the sequence of the amino acids in a protein, is thus copied into a mRNA chain. 11 N-terminus C-terminus AA Exiting empty tRNA AA - - - Polypeptide AA–chain - - - Ala Pro Lys pro tRNA tRNA tRNA tRNA C C tRNA that donates a segment A U U G G U mRNA A C A G G G C G A C C Ribosomal subunits U U Anticodon C G A mRNA Direction of translation Figure I.1.A-1.x Sequence of mRNA molecules control the forming of polymeric protein molecules towards protein synthesis. (The ribosomes instruct tRNAs to bring in specific amino acids into the sequence as dictated by mRNA, (which itself was built on the basis of genetic information of the nucleotides in the original gene portion of the DNA) The ribosome denotes a complex molecular machine seen in all living cells that offers the site for the transcribed DNA (existing as the mRNA) towards protein synthesis (or the translation process). The task of the ribosomes is to link the nucleotide triplets or codons (specified as amino acids) together in the order specified by mRNA molecules. Ribosomes consist of two major components: the small ribosomal subunit, which reads the RNA, and the large subunit, which joins amino acids to form a polypeptide chain. Each subunit is composed of one or more ribosomal RNA (rRNA) molecule and a variety of proteins. The ribosomes and associated molecules are also known as the translational apparatus. As stated earlier, the segment of a DNA encodes its content into a sequence of amino acids towards protein-making is copied many times into RNA chains. The ribosomes can bind to such a mRNA chain and enables determining the correct sequence of amino acids. That is, amino acids are selected, collected, and carried to the ribosome by transfer RNA (tRNA) molecules that enter one part of the ribosome and bind to the mRNA chain. That is, the tRNA is a specific type of RNA molecule that helps decoding a mRNA sequence into a protein. The tRNAs function at specific sites in the ribosome during translation involving synthesizing a protein from an mRNA molecule. The proteins as described above are built from smaller units of amino acids; and, each codon that represents a particular amino acid is recognized by a specific tRNA. The recognition process is as follows: The tRNA molecule has a characteristic folded structure with three hairpin loops that form the shape of a three-leafed clover as shown in Figure I.1.A-t. One of these hairpin loops contains a sequence called the anticodon, which can recognize and decode an mRNA codon. Each tRNA has its corresponding amino acid attached to its end. When a tRNA recognizes and binds to its corresponding codon in the ribosome, the tRNA transfers the appropriate amino acid to the end of the growing amino acid chain. Then, the tRNAs and ribosome continue to decode the mRNA molecule until the entire sequence is translated into a protein. 12 C-terminus: 3’ N-terminus: 5’ U A G A Ribosome U G || C C U || || G A mRNA Anticodon Figure I.1.A-t. Three-leafed clover shaped molecule of a tRNA with an example of anticodon The tRNA involved in the translation of the nucleic acid message into the amino acids of proteins itself is an RNA molecule having a conserved inverted L-structure. One end of the tRNA is an anticodon loop that pairs with a mRNA specifying a certain amino acid at that site. The other end of the tRNA has the amino acid attached to the 3' OH-group via an ester linkage. This state of tRNA attached with an amino acid is said to be "charged" and the enzyme that attaches the amino acid to the 3'-OH is called an aminoacyl-tRNA synthetase (aaRS). In all, a specific tRNA is designated for each of 20 amino acids. Likewise, there is a specific aaRS for each tRNA. Only the first 2 nucleotides in the tRNA anticodon loop are strictly required for the decoding of mRNA codon into an amino acid because as state before, the third nucleotide in any codon is less stringent inasmuch as, it represents the "wobble" base; and, as the genetic code is degenerate meaning that more than one codon can specify a single amino acid. As such, the anticodon of tRNA can pair with more than one mRNA codon and still be specific for a single amino acid. For example, suppose the tRNA has the anticodon CCC, which is complementary to the codon GGG that specifies the amino acid, glycine; and, apart from GGG, the three other codons GGA, GGU or GGC that also specify the amino acid glycine can be covalently bonded to tRNAs via relevant the anticodons. Thus, as shown in Figure I.1.A-t , the anticodon region of a tRNA is a sequence of three bases GCC that are complementary to a codon GGC in the messenger RNA. That is, for each coding trinucleotide (codon) in the mRNA, a distinct tRNA matches that carries the correct amino acid for that coding triplet. The attached amino acids are then linked together by another part of the ribosome. Thus, during translation, the bases of anticodon form complementary base pairs with the bases of the codon by forming the appropriate hydrogen bonds; and, this binding exercise enables correct translation of nucleic acid sequence into amino acid sequence occurs. While DNA carries and stores the information for protein synthesis, the RNA is the entity that carries out the instructions encoded in the DNA and synthesis of correct proteins is critical in the functioning of cells and organisms. There are three versions of RNA in protein synthesis exercise, namely, (i) mRNA – an entity framed from the DNA with a mapping of genomic information; that is, the messenger RNA carries the genetic information copied from DNA 13 framed as a nexus of “words” each constituted by a set of trinucleotides and, as indicated before, each of such triplet-word specifies a particular amino acid. (ii) tRNA – also known as sRNA or soluble-RNA, typically made of 76 to 90 nucleotides is a key that deciphers the code words in mRNA. Each type of amino acid has its own type of tRNA, which binds it and carries it to the growing end of a polypeptide chain whenever, the next code word on mRNA requires it. That is, the correct tRNA with its attached amino acid is selected at each step because each specific tRNA molecule contains a three-base sequence that can base-pair with its complementary code word in the mRNA; and, (iii) ribosomal RNA (rRNA) – a RNA that associates with a set of proteins to form complex, which physically move along an mRNA molecule, catalyze the assembly of amino acids into protein chains. They also bind tRNAs and various accessory molecules necessary for protein synthesis. Ribosomes are composed of a large and small subunit, each of which contains its own rRNA molecule or molecules. The steps of translation can be briefly stated as follows: Initiation In the first step of translation, all the relevant components, mRNA, tRNA and ribosomal units are conjectured together; that is, the mRNA entering the cytoplasm gets associated with ribosomes where the tRNAs, (each carrying a specific amino acid), pair up with the mRNA codons. That is, CW base-pairing (A ↔ U, G ↔ C) between mRNA codons and tRNA anticodons determines the order (or information-bearing) stretch of amino acids in a protein. Elongation: As the ribosome moves along the mRNA, the tRNA transfers its amino acid one-byone to the growing protein chain, thus producing the protein via appropriate linking a stretch of codon-by-codon residues. Termination: When the ribosome hits a stop codon (UAA, UGA, or UAG), the function of the ribosome stops terminating the making of the protein. In the above description, the same mRNA may be used hundreds of times during translation by many ribosomes before it is degraded (broken down) by the cell. In all, the translation step implies the universe of the process by which the base sequence of an mRNA is used to facilitate joining the amino acids in making a protein. All the three types of RNA summarized above participate in this protein-synthesizing pathway in all cells a molecular passage to the origin of life. Once a protein is produced, it can fold to assume a specific three-dimensional structure intended for certain functional tasks. Some proteins may start folding into their correct form during the synthesis process itself. Thus, protein refers to a molecule made of one or more chains of amino acids in a specific order. This order is decided by the base sequences of nucleotides in the gene coding for the protein. Proteins are essentially required for the structure, function and regulation of cells, tissues and organs. Each protein has a specific role for example, as hormones, enzymes and antibodies. The following subsection elaborates more details on proteins and the related study. The essence of genomic and proteomic considerations detailing genes, codons (exons and introns), amino acids, and central dogma of molecular biology are explained in Chapter I.1.Relevantly, how does a DNA code for RNA, which in turn, codes for proteins, other genetic aspects of heredity passed across the generation and how the proteins, the complex molecule of 14 life structured make to carry out functions needed for the survival of the living systems are discussed. Double-stranded DNA (dsDNA) ----- 5 5’ UTR ----- ---- Genomic DNA Unwound sense strand genomic DNA EXON EXON INTRON 3’ 3’ UTR INTRON UTR: Transcription 5’ UTR CDS: Coding sequence T T A G C A 3’ UTR Pre-mRNA U U A G C A mRNA Translation Protein complex Figure 2.1.x Illustrations on central dogma of microbiology In short, the passage of central dogma conforms to the transitions, DNA-to-RNA-to-protein via three versions of RNA, namely, messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA) is outlined in Figure 2.1.x. As regard to the details on the above, relevant examples, problems and project exercises are indicated in Chapter I.1. Also, more details on reading frames and ORF are furnished therein. In this section more considerations on genomic and proteomic sequence details are presented consistent with the details of this chapter. Amino acid (AA) The 64 triplets indicated above can be grouped into 20 types known as amino acids, which are commonly referred to as the “building blocks” of proteins. As mentioned earlier, the sequence of nucleotides in a DNA molecule depicts the genetic information. The nucleotide sequence formulates the set of 20 amino acids; and, a sequence made of selective amino acid triplets may constitute a protein. Such a sequence representing a protein is encoded within the DNA 15 sequence in question. The relation between the sequence of nucleotides and the sequence of the corresponding amino acid sequence depicting the protein is called genetic code. The characteristics of each amino acid are dependent on their side chain, and they can be divided into several classes. These classifications are denoted either as polar, nonpolar, acidic or as basic. Figure I. 1.a-y shows the structure of an amino acid. Amino acid (AA) Amino group NH2 Carbon atom Carboxyl group COOH Different side chains lead to different amino acids Side chain O H2N OH Side chain Figure I.1.a-y Amino acid structure 16 Table I.i.A- The sixty four (64 = 43) codon triplets constituted from the base set: {A, C, T, G} and grouping of the triplets into AAs 1st and 2nd nucleotides 3rd nucleotide added TTT TTC TTA TTG 3rd nucleotide added TCT TCC TCA TCG (F) Phe (F) Phe (L) Leu (L) Leu i TC (S) Ser (S) Ser (S) Ser (S) Ser CT CTT CTC CTA CTG (L) Leu (L) Leu (L) Leu (L) Leu i CC CCT CCC CCA CCG (P) Pro (P) Pro (P) Pro (P) Pro AT ATT ATC ATA ATG (I) Ile (I) Ile (I) Ile (M) Met i AC ACT ACC ACA ACG (T) Thr (T) Thr (T) Thr (T) Thr GT GTT GTC GTA GTG (V) Val (V) Val (V) Val (V) Val GC GCT GCC GCA GCG (T) Ala (T) Ala (T) Ala (T) Ala TA TAT TAC TAA TAG (Y) Tyr (Y) Tyr (*) Ter (*) Ter TG TGT TGC TGA TGG (C) Cys (C) Cys (*) Ter (W) Trp CA CAT CAC CAA CAG (H) His (H) His (Q) Gln (Q) Gln CG CGT CGC CGA CGG (R) Arg (R) Arg (R) Arg (R) Arg AA AAT AAC AAA AAG (N) Asn (N) Asn (K) Lys (K) Lys AG AGT AGC AGA AGG (S) Ser (S) Ser (R) Arg (R) Arg GA GAT GAC GAA GAG (D) Asp (D) Asp (E) Glu (E) Glu GG GGT GGC GGA GGG (G) Gly (G) Gly (G) Gly (G) Gly TT Amino acids 1st and 2nd nucleotides Amino acids Note: 1. ATG ⇒ (M) Met i: Start codon The codon ATG (or AUG in mRNA) provides the "start" (initiation) message for a ribosome that signals the initiation of protein translation from mRNA. Hence, methionine appears in the N-terminal position of all proteins in eukaryotes and archaea during translation, (although it is usually removed by post- 17 translational modification). 2. TAA ⇒ (*) Ter: Stop codon 3. TAG ⇒ (*) Ter: Stop codon 4. TGA ⇒ (*) Ter: Stop codon The three codons, TAG, TAA and TGA, (also referred to as amber, ochre and opal) codons, corresponding to UAG, UAA and UGA in mRNA) do terminate translation (stop) function. Non-standard/non-canonical amino acids A set of non-standard and non-coded amino acids known as nonproteinogenic, or "unnatural" amino acids also exist. They are not naturally encoded or found in the genetic code of any organisms. Apart from the tabulated 23 amino acids (21 in eukaryotes) or the proteinogenic amino acids used by the translational machinery in constructing the proteins, there are over 140 natural amino acids are known and thousands of more combinations are indicated as being feasible [A. Ambrogelly, S. Palioura and D. Söll: Natural expansion of the genetic code. Nature Chemical Biology, 2007 vol.3(1), 29–35]. The non-proteinogenic amino acids are considered in practice in view of the following: (i) They are intermediates in biosynthesis and posttranslationally incorporated into protein. (ii) They possess certain physiological roles (for example, as components of bacterial cell walls, neurotransmitters and toxins). (iii) They could be natural and man-made pharmacological compounds and (iv) they are present in meteorites and in certain prebiotic experiments The three extra proteinogenic amino acids, namely, selenocysteine,(Sec, U) pyrrolysine (Pyl, O) and N-formylmethionine. The formylmethionine is an amino acid encoded by the start codon AUG in bacteria, mitochondria and chloroplasts, but removed posttranslationally. 18 Table A-I.2. Standard amino acids: List of twenty distinct amino acids and their 3-letter and 1letter codes [IUPAC-IUB Joint Commission on Biochemical Nomenclature. Nomenclature and Symbolism for Amino Acids and Peptides. European Journal of Biochemistry, 1984, vol. 138, 937] Name Alanine Arginine Asparagine Aspartic Acid Cysteine Glutamine Glutamic Acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Selenocysteine Serine Threonine Tryptophan Tyrosine Valine Aspartic acid or Asparagine Glutamic acid or Glutamine Unspecified Amino Acid 3-letter code Ala Arg Asn Asp Cys Gln Glu Gly His IIe Leu Lys Met Phe Pro Sec Ser Thr Trp Tyr Val Asx 1-letter code A R N D C Q E G H I L K M F P U S T W Y V B Glx Z Xaa X The nucleotides may undergo polymerization to form a long chain of poly nucleotide. A polynucleotide is denoted by prefixing “poly” in each repeating unit, for example, poly A (polyadenylic acid), poly T (polythymilidic acid), poly G (polyguanidylic acid), poly C (polycytidilic acid) and poly U (polyuridylic acid). When the polynucletides have the same repeating units, they are designated as homopolynucleotides. Though meaning of each codon unique in most known organisms, the genetic code has been found to differ for a few codons in many mitochondria, in ciliated protozoans, and in Acetabularia, a single-celled plant as indicated in Table A-I.3. Such exceptions to the general code are regarded as due to later evolutionary developments. 19 Table A-1.3: Unusual codon usage in nuclear and mitochondrial genes [S. Osawa, TH Jukes, K. Watanabe and A. Muto: Recent evidence for evolution of the genetic code. Microbiology and Molecular Biology Reviews. 1992, vol. 56(1), 229-264] Codon (triplet) Standard universal code UGA Stop Unusual code 3-letter format Trp CUG Leu Thr UAA, UAG Stop Gln UGA Stop Cys *Occurrence in: Species Mycoplasma, Spiroplasma, Mitochondria of several species Mitochondria in yeasts Acetabularia, Tetrahymena, Paramecium, etc. Euplotes “Unusual code” is used in nuclear genes of the listed organisms and in mitochondrial genes as indicated”.
Purchase answer to see full attachment

Tags: "Biology" "DNA" "Bioengineering"

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Surname1
Name
Professor
Course
Date
Bioinformatics: Bioengineering Perspectives
Question 1
Suppose the following represent FOUR hypothetical DNA message strands as designated.
In each case, write down the following: (i) Complementary DNA strand (ii) pre-RNA
strand and (iii) assuming that the triplets indicated in red correspond to intron regimes,
write down the possible mRNA and (iii) decode the message to find the designated aminoacid (AA) sequences in terms of standard single letter and three letter formats.
a)
5′…CAC GCA TCG AAT CGG TAT AAA GCT CCC TTA ATC…3′
(i)

Complementary DNA strand

3′…GTG CGT AGC TTA GCC ATA TTT CGA GGG AAT TAG…5’
(ii)

pre-RNA strand

3’…GUG CGU AGC UUA GCC AUA UUU CGA GGG AAU UAG…5’
(iii)

assuming that the triplets indicated in red correspond to intron regimes,

3’…GUG GCC AUA UUU CGA GGG …5’
(iv)

decode the message to find the designated amino-acid (AA) sequences in terms of
standard single letter and three letter formats

3’…Val Ala Ile Phe Arg Gly…5’
3’…V A I F R G…5’
b)
5′…GGG CCG ACA CAC ATC CTA CTC CTC GAG CTT GAG…3′
1) Complementary DNA strand
3�...