1
BME 6762 – BIOINFORMATICS: BIOENGINEERING PERSPECTIVES
Fall 2017
Instructor:
Telephone:
Dr. P. S. Neelakanta
561/297-3469
E-Mail: neelakan@fau.edu
(EE. 96/Rm517)
Fax: 561/297-2800
Assignment A
FUNDAMENTALS
(General reference: Chapters 1 and 2 of the prescribed text-books
and Tutorial: SET I)
Submission Instructions
Due date: By 13 October 2017
Format:
1 Hard-copy, type-written given /posted to the Instructor by due date
1 Soft-copy: WORD document unzipped e-mailed to the Instructor
th
--------------------------------------------------------------------------------------------------------------------Problem # A.1
Suppose the following represent FOUR hypothetical DNA message strands as designated. In
each case, write down the following: (i) Complementary DNA strand (ii) pre-RNA strand and
(iii) assuming that the triplets indicated in red correspond to intron regimes, write down the
possible mRNA and (iii) decode the message to find the designated amino-acid (AA) sequences
in terms of standard single letter and three letter formats.
Given DNA sequences:
(a)
(b)
(c)
(d)
Hints:
(i)
(ii)
(iii)
5′…CAC
5′…GGG
5′…AAC
5′…CGG
GCA
CCG
GCA
CGG
TCG
ACA
CCG
ACA
AAT
CAC
AAT
CAC
CGG
ATC
TGG
ACC
TAT
CTA
TAT
GTA
AAA
CTC
AAG
ATC
GCT
CTC
GTT
CCC
CCC
GAG
GGG
GAT
TTA
CTT
TCA
ATT
ATC…3′
GAG…3′
CTC…3′
CAG…3′
Complementary strand: It corresponds to: 3’-5’ Watson-Crick pair-matched sequence
mRNA refers to: With introns spliced out, the complementary strand has U replacing
T.
Use the Tables on triplet/amino-acids
Example solution
i
(a)
3’…GTG CGT AGC TTA GCC ATA TTT CGA GGG AAT TAC…5’
ii
(a)
3’…GUG CGU AGC UUA GCC AUA UUU CGA GGG AAU UAC…5’
iii
2
(a)
3’…GUG GCC AUA UUU CGA GGG …5’
Problem # A.2
Each of the following hypothetical strand of amino-acids denote a subsection of a tRNA. In each
case, identify its associated RNA strand.
(1)
(2 )
(3)
(4 )
5′…
5′…
5′…
5′…
Ile
R O
Phe
Y O
Met
U T
Glu
U H
Glu
A W
Met
Q R
Leu
H O
Pro
C P
Leu
M R
Leu
U X
Ser
T …
Ala
Z …
Gly Stop … 3′
3′
Gly Stop Asp … 3′
3′
Hints:
(a) Use the Tables on triplet/amino-acids
(b) Back translating protein to RNA (or DNA) is not advocated. It is difficult to inversetranslate amino-acids into RNA as there are only 20 amino acids and 64 different
variations of the bases T, C, A & G. Therefore, different variations of bases (in the third
wobbling position) could result in the same amino-acids.
Example solution
(1)
5′… Ile Met Glu Leu Leu Ser Gly Stop … 3′
5’ … AU(U,C,A) AUG GA(A,G) CU(U,C,A,G) CU(U,C,A,G) UC(U,C,A,G)
GG(U,C,A,G) UA(A,G) … 3’
Problem # A.3
The following are two pairs of hypothetical triplet sequences. Considering each pair, do the
sequences in the pair represent identical message encoded (mRNA) in the context of Central
Dogma. If so, justify.
Pair I
5′… TTA CTT ATT GTT TCT GCT TTA ATC…3′
5′… TTA CTG ATT GTG TCG GCG TTA ATG…3′
-------------------------Pair II
5′… TTG CTT ATT GTT TAT GCT TTA ATC…3′
5′… ATT TCT ATT GTG CAT CGG TTA GGG…3′
Hints:
Use the Tables on triplet/amino-acids and write down the AAs of the triplets given in each
sequence. Hence conclude
3
Problem # A.4
When a hypothetical DNA strand is analyzed, the following are details on the counts of the
bases, A, C, T and D across 10 segments (in windows of unequal residual lengths). Identify with
reasons those segments that can be most likely regarded as noncodons.
Segment
ID
I
II
III
IV
V
VI
VII
VIII
IX
X
Total
number
of
residues
1020
796
1608
3002
1228
2010
1616
808
646
2022
Individual residue counts:
(Approximate) in each
segment
A
C
T
G
252
247
250
271
64
332
90
310
539
358
264
447
664 1108 412
818
300
319
316
293
537
483
519
471
346
553
350
367
198
225
200
185
158
165
160
163
627
269
234
892
Hint: In the noncodon section, the four residues are approximately “equally likely” to occur.
Problem # A.5
Plot the associated hydropathy (index) profiles along the following amino-acid sequences:
(1)
5′… Phe Met Ile Leu Thr Ser Ser His Gly Stop … 3′
(2 )
5′… A R Q U T K W H O P R T Z … 3′
(3)
5′… Leu Thr Ile Ser Val Ser Ala His Arg Ter … 3′
(4 )
5′… E D Q Y T N W G C I F V (*) … 3′
Hint: Use the hydropathy table on AA residues
--------------------------------------------------------------------------------------------------------------------TUTORIAL (Excerpted from Neelakanta’s Book in Progress)
THE CENTRAL DOGMA – WHAT IS IT?
The central dogma of molecular biology refers to the principle governing the flow of generic
information from DNA to RNA and finally enabling the translation of proteins. It was enunciated
by Francis Crick [F.H. C. Crick: Central dogma of molecular biology. Nature, 1970, vol.227
561–563] as described by James Watson in [J. D. Watson: The Molecular Biology of the Gene.
W.A. Benjamin. Inc. , New York, NY:1965] The central dogma addresses the underlying
protocol on how the DNA defines the synthesis of protein by way of an RNA intermediary and
passing through four major stages, namely, replication, transcription, processing, and
4
translation. “The central dogma of molecular biology deals with the detailed residue-by-residue
transfer of sequential information. It states that such information cannot be transferred from
protein to either protein or nucleic acid”. This is illustrated in Figure 2.1.2
Transcribed to:
DNA
Messenger RNA in
the
cell nucleus
Translated to:
Protein in the
cytoplasm
Figure 2.1.2 Pertinent to a DNA, the central dogmatic stages defining the hierarchical steps of
synthesis of protein using an RNA intermediary. “Once (sequential) information has passed into
protein it cannot get out again [F. H. C. Crick: On protein synthesis. Symposium of the Society
of the Experimental Biology: XII, 1958, pp.138-163].
The sequence of DNA into protein-making is the central dogma of biology illustrated in Figure
2.1.3
Transcription at
cell nucleus
DNA
PROTEIN
RNA
Translation at
cytoplasm
Figure 2.1.3 Transcription to translation steps
The flow of central dogmatic processes involved in step-by-step towards protein synthesis is
further illustrated in Figure 2.1.4. Whenever proteins are needed, the corresponding genes of the
DNA are first transcribed into ribonucleic acid or RNA. In RNA, the nucleotide base uracil (U)
replaces the thymine present in DNA. This process of DNA gene made into RNA is known as
transcription. Thus, the RNA depicts a single-strand derived from the DNA part carrying
instructions out of the nucleus to places as needed throughout the cell. It thus enables a
messenger RNA (mRNA) as indicated in Figure 2.1.4. Eventually, the mRNA encoded with
genetic information derived via transcription process serves as a recipe on how to build a protein
molecule in the ribosomal section of the cell as shown in Figure 2.1.5.
The flow of genetic information from the genes determines the protein composition and thereby
the functions of the cell. Proteins perform important tasks for the cell functions and serve as the
basic building blocks of living systems.
In the transcription stage of realizing the RNA derived from the DNA, the associated
change in nucleotide chemistry is as follows: The sugar in DNA is deoxyribose, as described in
5
Appendix 2.x and in the RNA, this sugar composition turns into a ribose. Corresponding change
implies that a nitrogenous base, uracil (U) is assumed in RNA in lieu of T; here, U and T are
very similar bases with complementary paring (U ↔ A) in the RNA, is identical to (T ↔ A) in
the DNA.
The DNA is initially situated in the nucleus organized into chromosomes within the cell;
and as such, each cell holds the genetic information. (The DNA gets duplicated whenever a cell
divides and the process of such cell division is known as replication.
The pre-mRNA stage indicated in Figure 2. 4. 4 is the result of “processing” of the
unzipped complementary strand of the helical DNA structure. Processing involves the removal
of non-coding parts from the unzipped complementary strand and subsequently, transportation of
the chain out of the nucleus. Next, the proteins are built outside the nucleus in the cell. It is based
on the code inscribed in the mRNA, which is translated into a corresponding protein complex.
In short, the central dogma of molecular biology governs the flow of generic information from
DNA to RNA and finally enabling the translation of proteins. It was originally defined by James
Watson. He addressed the concept of how DNA defines the synthesis of protein by way of an
RNA intermediary via four major stages: namely, replication, transcription, processing, and
translation. This
Replication implies that the DNA can self replicate in the nucleus of a cell by using one
strand of the double helix as a template. (This is true for a eukaryote, an organism such as a
bacterium that has no nucleus. As such, its DNA replicates in the cytoplasm of this single- celled
organism). The DNA also codes for mRNA in a process called transcription. That is, the DNA
codes for the production of mRNA during transcription. As mentioned before, the structure of
RNA is similar to that of DNA except that RNA exists as a single stranded unit with Uracil
replacing the DNA counterpart Thymine. In eukaryotic cells, the mRNA is processed and
migrates from the nucleus to the cytoplasm. In summary, transcription implies:
◦
◦
RNA synthesis within cell nucleus
DNA converted to a single-cell RNA (nRNA), which is processed then to a matured RNA
(mRNA) known as messenger RNA
The mRNA is then transported out of the nucleus and into the cytoplasm of eukaryotes where
proteins are formed through a process called translation. Messenger RNA carries coded
information to ribosome; and, the ribosomes “read” this information and use it for protein
synthesis.
In summary, the translation process refers to:
◦
◦
mRNA, when transported through the nucleus membrane to the cytoplasm, then it is
translated into protein with the help of ribosomes
The ribosomes contain a variety of different proteins and assortment of RNA molecules
collectively known as ribosomal RNA (rRNA). These short-lived, but abundant rRNAs
are involved in the binding of mRNA to ribosome during protein-making translation
process.
The process of protein-making through the central dogma of molecular biology illustrated in
Figure I.1.A-w 1 involves, first the transcribing of information in sections from the DNA strand
into an intermediate polymer called messenger RNA (or mRNA), which is similar to DNA except
6
that the sugar residue R is replaced with a slightly different one, namely Ribose R′; and, a base
called uracil (U) replaces the thymine (T) of the DNA.
Double-stranded DNA (dsDNA)
-----
-----
---G C A T -------- C A G T
C G T A -------- G T C A
Crick-Watson base
paring in dsDNA
Unzipped
strand
segment
Gene with
genetic
information
5’
3’
A A T C G T
T T A G C A
3’ Pre-mRNA
5’
Transcription
5’ U U A G C A
3’
mRNA
with coded
information
Figure 2.1.4 Transcription process relevant to a double-stranded DNA sequence
7
mRNA
with coded genetic
information
-----
----5’
U U A G CA
Translation of
mRNA to protein
synthesis
3’
Ribosome
Protein
Figure 2.1.5 Translational process on the transcribed genetic information shown in Figure 2.1.4
The machinery of making proteins via translation involves step-by-step addition of amino
acids to a growing protein chain by a ribozyme (called a ribosome). That is, when mRNA is
made available at the ribosome acting as message centers situated throughout the cell, the coded
information in the mRNA gets translated into a biochemical polymeric complex as decided by a
protocol of operations dictating, which building blocks are needed and in what order so that,
eventually an assembled composition, namely, the protein is synthesized.
The above operation happens with the help of ribosomal RNA. This ribosomal
ribonucleic acid (rRNA) is the RNA component of the ribosome. It is the predominant material
within the ribosome with two subunits, namely, the large subunit (LSU) and small subunit
(SSU). The LSU rRNA acts as a ribozyme enabling the formation of catalyzing peptide bond in
synthesizing the protein. Thus, the culmination of translation yields proteins made up of several
or many polypeptides. As is evident from the stages involved in protein-making, such
polypeptide building blocks are chains of amino acids constituted by the triplets of the set, {A, C,
U, G}; and, in essence, the protein is the culminating result of an one-dimensional sequential
arrangement of amino acids (residues) bearing information consistent with the genetic code, as
read from an mRNA template; and, the RNA itself is a template-copy of one of the organism's
genes derived via the steps of central dogma as described above. Thus, the agenda of how the
information in DNA is turned into protein is the tale of central dogma in microbiology.
The amino acid participating as residues in deciding ultimately, the protein composition
(as per genetic code) are triplets or codons formed by the alphabet set of the DNA namely, {A,
C, T, G. Taken in triplets of the bases results in set of codons with a cardinality of sixty four (=
43). In terms of these triplets (or codons) grouped appropriately, a set of twenty distinct amino
acids are prescribed. These twenty amino acids are encoded by the standard genetic code and are
called proteinogenic or standard amino acids as tabulated in Appendix 2x.
In summary, the stages of central dogma of microbiology indicate how the DNA is
transcribed into mRNA, which is then translated into amino acids and eventually into protein.
8
Once the mRNA is made, it is trimmed down to a final size and sent to the nucleus; and, as the
mRNA gets into the cytoplasm within the cell, it is framed into the desired protein. In essence,
the "messenger", namely, the nucleic acid (dubbed as mRNA) carries the genetic instructions
from the DNA into the cytoplasm. That is, the mRNA depicts a copy of a gene and serves the
function of bridging the entities, DNA-and-protein. Thus, the gene part of the DNA transcribed
into in each chromosome provides the instructions for a protein to be made. The statistical
features of the order of nucleotide bases in the DNA, specifies the random order of amino acid
residues in the mRNA; and, the stochastic profile of these residues sequenced in the mRNA
specifies the posentropy (or information in Shannon’s sense) in deciding the protein complex
synthesized. (The informatic aspects of such message-bearing entities are discussed in Chapter
I.2).
Subsequent to transcription, the translation process involved conforms to the mRNA
being transported in to the cytoplasm where the encoded information in the gene-template is "decoded" or "translated", so as to produce the correct order of amino acids in a protein. This
translation is enable by the intervention of certain enzymes listed below:
Ribosomal RNA (rRNA): These are RNA molecules representing certain proteins to form
the ribosomes; and, each ribosome at a time can accept two transfer RNAs (tRNAs) and
one mRNA.
The tRNAs are small RNA molecules that carry a specific amino acid at one end and an
anticodon region that recognizes and binds mRNA at the other end. The tRNA that binds
to that mRNA codon determines what amino acid is added to a protein chain.
Thus, the three RNAs (namely, mRNA, tRNA, and rRNA) participate collectively turn
the genetic information in the DNA into an eventual 3-dimestional protein. In all, the translation
is the process by which the nucleotide sequence of mRNA is converted to the amino acid
sequence of a polypeptide. (In prokaryotic single-celled organisms (such as, the bacteria), the
translation process takes place in the cytoplasm). The steps of translation can be stated as
illustrated in Figure I.1.A:
DNA
Transcribed to:
Messenger RNA
in the
cell nucleus
Translated to:
Protein in the
cytoplasm
9
DNA
5’
PrecursormRNA
3’
Transcription
Translation
mRNA
Protein
Folded
protein
Intron: Non-informative segment
Exon: Information-bearing segment
Untranslated region (UTR)
.
Figure I.1.A-1.x Making of a protein: The central dogma of a DNA defining the synthesis of
protein using an RNA intermediary
5´
A T
G G
A T
C
A
5´
||
WC matched-pairs
||
3´
3´
dsDNA
T G T A C C T A
A C A U G G A U
RNA
polymerase
mRNA
Figure I.1.A-1.x
Pre-mRNA synthesis through transcription
The genetic information contained originally in the DNA is carried forward when the bases are
set in triplet forms. With the four {A, C, T, G} bases, the possible permuted triplets are 64 (= 43)
and they are grouped into 20 amino acids; and, each amino acid being the triplet of bases bears
the mapped genetic code of the DNA. The transcription occurs at a specific site on one strand of
10
DNA known as transcription initiation site, marked by a characteristic base sequence. The
transcription proceeds through a specific chemical pairing namely, the WC-paring (A ⇔ T/U)
and (G ⇔ C) mentioned earlier. The transcription process, in essence, is an information retrieval
technique from the original memory units of the DNA.
In the subsequent process of translation, the information contained in amino acids
constraints the cell in what order that the amino acids be strung together in making of the protein
constituents. The eventual (correct) translation of eukaryotic genomic data into a protein
complex is, however, subject to the effects of mutations on the evolutionary conservation. Any
underlying corruptions may manifest at the so-called splice junctions that separate/delineate two
subsequences in a DNA sequence, namely, the (genetic) information-bearing codon segment
(called an exon) and the non-informative “junk” codon, also known as non-codon or intron.
Exons bear necessary information towards protein-making, whereas non-codons are noninformative and their genetic role has not been fully elucidated. Exons and introns appear
randomly along the DNA sequence as shown in Fig. 2. Codons tend to be typically no more than
200 characters long, while noncodons could be tens of thousands of characters in length. Thus in
majority, introns prevail mostly in a typical eukaryotic gene.
Towards the process of protein-making, introns are first scissored out (in the transcription
stage) from the sequence and the remaining exons are spliced together constituting the mRNA,
which is rendered ready for translation into a protein complex (at the cell interior). Should any
errors have occurred (due to mutations), they would give room to the possibility of evolving
wrong or cryptic splice-junctions and lead to (imperfect) translations. That is, aberrant splicejunctions may result from mutational spectrum and would hamper the making of correct proteins.
In short, the concept of central dogma adopted resulting in the culmination of proteinmaking can be summarized as follows: Transcription: Relevant to a dsDNA helix, a portion of it
that corresponds to a gene is first unzipped to form a messenger (mRNA) as shown in Figure
I.1.A-r. This unzipped part thus belongs to the one-side of the DNA, namely, the 3´- 5´ reverse
strand. In the unzipped part, the nucleotide base T is replaced by, uracil (U). The resulting
product is the pre-mRNA. Of its contents, namely, the exons and introns, the introns are removed
and the exons are spliced together to constitute a mature mRNA, which leaves the nucleus to be
transcribed by the ribosome. The sequence of DNA, which encodes the sequence of the amino
acids in a protein, is thus copied into a mRNA chain.
11
N-terminus
C-terminus
AA
Exiting
empty
tRNA
AA
- - - Polypeptide AA–chain - - -
Ala
Pro
Lys
pro
tRNA
tRNA
tRNA
tRNA
C C
tRNA
that donates
a segment
A
U U
G G U
mRNA
A
C
A G
G
G C
G A
C C
Ribosomal subunits
U
U
Anticodon
C G
A
mRNA
Direction of translation
Figure I.1.A-1.x Sequence of mRNA molecules control the forming of polymeric protein
molecules towards protein synthesis. (The ribosomes instruct tRNAs to bring in specific amino
acids into the sequence as dictated by mRNA, (which itself was built on the basis of genetic
information of the nucleotides in the original gene portion of the DNA)
The ribosome denotes a complex molecular machine seen in all living cells that offers the
site for the transcribed DNA (existing as the mRNA) towards protein synthesis (or the
translation process). The task of the ribosomes is to link the nucleotide triplets or codons
(specified as amino acids) together in the order specified by mRNA molecules. Ribosomes
consist of two major components: the small ribosomal subunit, which reads the RNA, and the
large subunit, which joins amino acids to form a polypeptide chain. Each subunit is composed of
one or more ribosomal RNA (rRNA) molecule and a variety of proteins. The ribosomes and
associated molecules are also known as the translational apparatus.
As stated earlier, the segment of a DNA encodes its content into a sequence of amino
acids towards protein-making is copied many times into RNA chains. The ribosomes can bind to
such a mRNA chain and enables determining the correct sequence of amino acids. That is, amino
acids are selected, collected, and carried to the ribosome by transfer RNA (tRNA) molecules that
enter one part of the ribosome and bind to the mRNA chain. That is, the tRNA is a specific type
of RNA molecule that helps decoding a mRNA sequence into a protein. The tRNAs function at
specific sites in the ribosome during translation involving synthesizing a protein from an mRNA
molecule. The proteins as described above are built from smaller units of amino acids; and, each
codon that represents a particular amino acid is recognized by a specific tRNA. The recognition
process is as follows: The tRNA molecule has a characteristic folded structure with three hairpin
loops that form the shape of a three-leafed clover as shown in Figure I.1.A-t. One of these
hairpin loops contains a sequence called the anticodon, which can recognize and decode an
mRNA codon. Each tRNA has its corresponding amino acid attached to its end. When a tRNA
recognizes and binds to its corresponding codon in the ribosome, the tRNA transfers the
appropriate amino acid to the end of the growing amino acid chain. Then, the tRNAs and
ribosome continue to decode the mRNA molecule until the entire sequence is translated into a
protein.
12
C-terminus:
3’
N-terminus:
5’
U A
G A
Ribosome
U
G
||
C
C U
||
||
G A
mRNA
Anticodon
Figure I.1.A-t. Three-leafed clover shaped molecule of a tRNA with an example of anticodon
The tRNA involved in the translation of the nucleic acid message into the amino acids of
proteins itself is an RNA molecule having a conserved inverted L-structure. One end of the
tRNA is an anticodon loop that pairs with a mRNA specifying a certain amino acid at that site.
The other end of the tRNA has the amino acid attached to the 3' OH-group via an ester linkage.
This state of tRNA attached with an amino acid is said to be "charged" and the enzyme that
attaches the amino acid to the 3'-OH is called an aminoacyl-tRNA synthetase (aaRS). In all, a
specific tRNA is designated for each of 20 amino acids. Likewise, there is a specific aaRS for
each tRNA. Only the first 2 nucleotides in the tRNA anticodon loop are strictly required for the
decoding of mRNA codon into an amino acid because as state before, the third nucleotide in any
codon is less stringent inasmuch as, it represents the "wobble" base; and, as the genetic code is
degenerate meaning that more than one codon can specify a single amino acid. As such, the
anticodon of tRNA can pair with more than one mRNA codon and still be specific for a single
amino acid. For example, suppose the tRNA has the anticodon CCC, which is complementary to
the codon GGG that specifies the amino acid, glycine; and, apart from GGG, the three other
codons GGA, GGU or GGC that also specify the amino acid glycine can be covalently bonded
to tRNAs via relevant the anticodons.
Thus, as shown in Figure I.1.A-t , the anticodon region of a tRNA is a sequence of three
bases GCC that are complementary to a codon GGC in the messenger RNA. That is, for each
coding trinucleotide (codon) in the mRNA, a distinct tRNA matches that carries the correct
amino acid for that coding triplet. The attached amino acids are then linked together by another
part of the ribosome. Thus, during translation, the bases of anticodon form complementary base
pairs with the bases of the codon by forming the appropriate hydrogen bonds; and, this binding
exercise enables correct translation of nucleic acid sequence into amino acid sequence occurs.
While DNA carries and stores the information for protein synthesis, the RNA is the entity
that carries out the instructions encoded in the DNA and synthesis of correct proteins is critical in
the functioning of cells and organisms. There are three versions of RNA in protein synthesis
exercise, namely, (i) mRNA – an entity framed from the DNA with a mapping of genomic
information; that is, the messenger RNA carries the genetic information copied from DNA
13
framed as a nexus of “words” each constituted by a set of trinucleotides and, as indicated before,
each of such triplet-word specifies a particular amino acid. (ii) tRNA – also known as sRNA or
soluble-RNA, typically made of 76 to 90 nucleotides is a key that deciphers the code words in
mRNA. Each type of amino acid has its own type of tRNA, which binds it and carries it to the
growing end of a polypeptide chain whenever, the next code word on mRNA requires it. That is,
the correct tRNA with its attached amino acid is selected at each step because each specific
tRNA molecule contains a three-base sequence that can base-pair with its complementary code
word in the mRNA; and, (iii) ribosomal RNA (rRNA) – a RNA that associates with a set of
proteins to form complex, which physically move along an mRNA molecule, catalyze the
assembly of amino acids into protein chains. They also bind tRNAs and various accessory
molecules necessary for protein synthesis. Ribosomes are composed of a large and small subunit,
each of which contains its own rRNA molecule or molecules. The steps of translation can be
briefly stated as follows:
Initiation In the first step of translation, all the relevant components, mRNA, tRNA and
ribosomal units are conjectured together; that is, the mRNA entering the cytoplasm gets
associated with ribosomes where the tRNAs, (each carrying a specific amino acid), pair up with
the mRNA codons. That is, CW base-pairing (A ↔ U, G ↔ C) between mRNA codons and
tRNA anticodons determines the order (or information-bearing) stretch of amino acids in a
protein.
Elongation: As the ribosome moves along the mRNA, the tRNA transfers its amino acid one-byone to the growing protein chain, thus producing the protein via appropriate linking a stretch of
codon-by-codon residues.
Termination: When the ribosome hits a stop codon (UAA, UGA, or UAG), the function of the
ribosome stops terminating the making of the protein. In the above description, the same mRNA
may be used hundreds of times during translation by many ribosomes before it is degraded
(broken down) by the cell.
In all, the translation step implies the universe of the process by which the base sequence
of an mRNA is used to facilitate joining the amino acids in making a protein. All the three types
of RNA summarized above participate in this protein-synthesizing pathway in all cells a
molecular passage to the origin of life. Once a protein is produced, it can fold to assume a
specific three-dimensional structure intended for certain functional tasks. Some proteins may
start folding into their correct form during the synthesis process itself. Thus, protein refers to a
molecule made of one or more chains of amino acids in a specific order. This order is decided by
the base sequences of nucleotides in the gene coding for the protein. Proteins are essentially
required for the structure, function and regulation of cells, tissues and organs. Each protein has a
specific role for example, as hormones, enzymes and antibodies. The following subsection
elaborates more details on proteins and the related study.
The essence of genomic and proteomic considerations detailing genes, codons (exons and
introns), amino acids, and central dogma of molecular biology are explained in Chapter
I.1.Relevantly, how does a DNA code for RNA, which in turn, codes for proteins, other genetic
aspects of heredity passed across the generation and how the proteins, the complex molecule of
14
life structured make to carry out functions needed for the survival of the living systems are
discussed.
Double-stranded DNA (dsDNA)
-----
5
5’ UTR
-----
----
Genomic
DNA
Unwound sense strand genomic DNA
EXON
EXON
INTRON
3’
3’ UTR
INTRON
UTR:
Transcription
5’ UTR
CDS: Coding sequence
T T A G C A
3’ UTR
Pre-mRNA
U U A G C A
mRNA
Translation
Protein
complex
Figure 2.1.x Illustrations on central dogma of microbiology
In short, the passage of central dogma conforms to the transitions, DNA-to-RNA-to-protein via
three versions of RNA, namely, messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal
RNA (rRNA) is outlined in Figure 2.1.x. As regard to the details on the above, relevant
examples, problems and project exercises are indicated in Chapter I.1. Also, more details on
reading frames and ORF are furnished therein. In this section more considerations on genomic
and proteomic sequence details are presented consistent with the details of this chapter.
Amino acid (AA)
The 64 triplets indicated above can be grouped into 20 types known as amino acids, which are
commonly referred to as the “building blocks” of proteins. As mentioned earlier, the sequence of
nucleotides in a DNA molecule depicts the genetic information. The nucleotide sequence
formulates the set of 20 amino acids; and, a sequence made of selective amino acid triplets may
constitute a protein. Such a sequence representing a protein is encoded within the DNA
15
sequence in question. The relation between the sequence of nucleotides and the sequence of the
corresponding amino acid sequence depicting the protein is called genetic code.
The characteristics of each amino acid are dependent on their side chain, and they can be
divided into several classes. These classifications are denoted either as polar, nonpolar, acidic or
as basic. Figure I. 1.a-y shows the structure of an amino acid.
Amino acid (AA)
Amino group
NH2
Carbon atom
Carboxyl group
COOH
Different side chains
lead to different
amino acids
Side chain
O
H2N
OH
Side chain
Figure I.1.a-y Amino acid structure
16
Table I.i.A-
The sixty four (64 = 43) codon triplets constituted from the base set: {A, C, T, G} and
grouping of the triplets into AAs
1st and 2nd
nucleotides
3rd
nucleotide
added
TTT
TTC
TTA
TTG
3rd
nucleotide
added
TCT
TCC
TCA
TCG
(F) Phe
(F) Phe
(L) Leu
(L) Leu i
TC
(S) Ser
(S) Ser
(S) Ser
(S) Ser
CT
CTT
CTC
CTA
CTG
(L) Leu
(L) Leu
(L) Leu
(L) Leu i
CC
CCT
CCC
CCA
CCG
(P) Pro
(P) Pro
(P) Pro
(P) Pro
AT
ATT
ATC
ATA
ATG
(I) Ile
(I) Ile
(I) Ile
(M) Met i
AC
ACT
ACC
ACA
ACG
(T) Thr
(T) Thr
(T) Thr
(T) Thr
GT
GTT
GTC
GTA
GTG
(V) Val
(V) Val
(V) Val
(V) Val
GC
GCT
GCC
GCA
GCG
(T) Ala
(T) Ala
(T) Ala
(T) Ala
TA
TAT
TAC
TAA
TAG
(Y) Tyr
(Y) Tyr
(*) Ter
(*) Ter
TG
TGT
TGC
TGA
TGG
(C) Cys
(C) Cys
(*) Ter
(W) Trp
CA
CAT
CAC
CAA
CAG
(H) His
(H) His
(Q) Gln
(Q) Gln
CG
CGT
CGC
CGA
CGG
(R) Arg
(R) Arg
(R) Arg
(R) Arg
AA
AAT
AAC
AAA
AAG
(N) Asn
(N) Asn
(K) Lys
(K) Lys
AG
AGT
AGC
AGA
AGG
(S) Ser
(S) Ser
(R) Arg
(R) Arg
GA
GAT
GAC
GAA
GAG
(D) Asp
(D) Asp
(E) Glu
(E) Glu
GG
GGT
GGC
GGA
GGG
(G) Gly
(G) Gly
(G) Gly
(G) Gly
TT
Amino
acids
1st and 2nd
nucleotides
Amino
acids
Note:
1. ATG ⇒ (M) Met i: Start codon
The codon ATG (or AUG in mRNA) provides the "start"
(initiation) message for a ribosome that signals the initiation of
protein translation from mRNA. Hence, methionine appears in
the N-terminal position of all proteins in eukaryotes and archaea
during translation, (although it is usually removed by post-
17
translational modification).
2. TAA ⇒ (*) Ter: Stop codon
3. TAG ⇒ (*) Ter: Stop codon
4. TGA ⇒ (*) Ter: Stop codon
The three codons, TAG, TAA and TGA, (also referred to as
amber, ochre and opal) codons, corresponding to UAG, UAA and
UGA in mRNA) do terminate translation (stop) function.
Non-standard/non-canonical amino acids
A set of non-standard and non-coded amino acids known as nonproteinogenic, or "unnatural" amino acids also exist. They are not
naturally encoded or found in the genetic code of any organisms.
Apart from the tabulated 23 amino acids (21 in eukaryotes) or the
proteinogenic amino acids used by the translational machinery in
constructing the proteins, there are over 140 natural amino acids are
known and thousands of more combinations are indicated as being
feasible [A. Ambrogelly, S. Palioura and D. Söll: Natural expansion
of the genetic code. Nature Chemical Biology, 2007 vol.3(1), 29–35].
The non-proteinogenic amino acids are considered in practice in view
of the following: (i) They are intermediates in biosynthesis and posttranslationally incorporated into protein. (ii) They possess certain
physiological roles (for example, as components of bacterial cell
walls, neurotransmitters and toxins). (iii) They could be natural and
man-made pharmacological compounds and (iv) they are present in
meteorites and in certain prebiotic experiments
The three extra proteinogenic amino acids, namely,
selenocysteine,(Sec, U) pyrrolysine (Pyl, O) and N-formylmethionine.
The formylmethionine is an amino acid encoded by the start codon
AUG in bacteria, mitochondria and chloroplasts, but removed
posttranslationally.
18
Table A-I.2. Standard amino acids: List of twenty distinct amino acids and their 3-letter and 1letter codes [IUPAC-IUB Joint Commission on Biochemical Nomenclature. Nomenclature and
Symbolism for Amino Acids and Peptides. European Journal of Biochemistry, 1984, vol. 138, 937]
Name
Alanine
Arginine
Asparagine
Aspartic Acid
Cysteine
Glutamine
Glutamic Acid
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Selenocysteine
Serine
Threonine
Tryptophan
Tyrosine
Valine
Aspartic acid
or Asparagine
Glutamic acid
or Glutamine
Unspecified
Amino Acid
3-letter
code
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
IIe
Leu
Lys
Met
Phe
Pro
Sec
Ser
Thr
Trp
Tyr
Val
Asx
1-letter
code
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
U
S
T
W
Y
V
B
Glx
Z
Xaa
X
The nucleotides may undergo polymerization to form a long chain of poly nucleotide. A
polynucleotide is denoted by prefixing “poly” in each repeating unit, for example, poly A
(polyadenylic acid), poly T (polythymilidic acid), poly G (polyguanidylic acid), poly C
(polycytidilic acid) and poly U (polyuridylic acid). When the polynucletides have the same
repeating units, they are designated as homopolynucleotides.
Though meaning of each codon unique in most known organisms, the genetic code has
been found to differ for a few codons in many mitochondria, in ciliated protozoans, and in
Acetabularia, a single-celled plant as indicated in Table A-I.3. Such exceptions to the general
code are regarded as due to later evolutionary developments.
19
Table A-1.3: Unusual codon usage in nuclear and mitochondrial genes [S. Osawa, TH Jukes, K.
Watanabe and A. Muto: Recent evidence for evolution of the genetic code. Microbiology and
Molecular Biology Reviews. 1992, vol. 56(1), 229-264]
Codon
(triplet)
Standard
universal
code
UGA
Stop
Unusual
code
3-letter
format
Trp
CUG
Leu
Thr
UAA,
UAG
Stop
Gln
UGA
Stop
Cys
*Occurrence in:
Species
Mycoplasma,
Spiroplasma,
Mitochondria of
several species
Mitochondria in
yeasts
Acetabularia,
Tetrahymena,
Paramecium,
etc.
Euplotes
“Unusual code” is used in nuclear genes of the listed
organisms and in mitochondrial genes as indicated”.
Purchase answer to see full
attachment