Showing Page:
1/8
CHAPTER I.
INTRODUCTION
Exploratory Analysis of the Genetic Variability of the PE-PGRS Gene
Overview
The global burden of tuberculosis (TB) is staggering (almost 9 million cases per annum) (1).
India and China have the largest incidence (This is the number of new cases of active TB disease
).India is estimated to have 2.2 million incident cases of TB in 2015 (World Health Organization
(WHO), 2016)while China is the second largest TB epidermic with an estimated incidence of
930,000 recorded in 2014 http://www.tbfacts.org/china/ Visited 18/06/2017
. South Africa is one of the countries that has the largest burden (i.e. the largest number of
persons infected per capita)of TB with an estimated incidence of 450,000 cases of active TB in
2013 (WHO, 2016). In South Africa, TB is the leading cause of death. The country also has the
highest burden of HIV co-infections globally (estimated at approximately 7,03 million in 2016 )
(1). Consequently, individuals that are co-infected with HIV are driving the epidemic and
account for more than 60% of the new TB cases (2). In HIV-infected persons, latent TB infection
(LTBI) co-infection rates are high (> 80%) and progression from LTBI to active TB is occurring
at a rate of up to 10% per annum, rather than 10% over one's lifetime (3). This persistent
reservoir of infection is subverting eradication efforts and this is primarily because current tools
to detect LTBI; as well as, differentiate it from subclinical or active TB, are sub-optimal (3).
We also know very little about the fundamental biology of LTBI including the physical nature of
LTBI and its transcriptional profile in human tissues (4). For instance, the granuloma is the
characteristic hallmark of mycobacterial disease (5), and presumed LTBI is not often visually
apparent. One report showed that viable mycobacteria were cultured from macroscopically
normal tissue when inoculated into guinea pigs (5), and several worldwide reports that date back
as early as 1900 (Otto Naegeli, 1900. Virchows Arch. Path. Anat., 160,426) showed that
mycobacteria could be cultured from the lungs of individuals that died from other diseases. We
also know from large controlled trials that treating TST+ve (tuberculin skin test positive)
persons without active TB reduces the development of active TB by 60 to 80% (5) but
conversely the positive predictive value of tests like the IGRA (Interferon Gamma Release
Assay) and TST for the development of active TB is only about 2 to 3% in HIV co-infected
persons (6). Collectively, these data indicate that although TB infection is a spectrum of disease
(7) the pathophysiology and associated biological processes of viable and persistent
mycobacteria found in tissues that are not culturable remain unclear.
Mycobacterium tuberculosis (M.tb) is an obligate pathogenic bacterial species in the family
Mycobacteriaceae and it is caused by an agent known as TB. M.tb H37Rv strain genome was
first published in 1998.[32] It estimated to be 4 million base pairs in size with approximately
3,959 genes; 40% of the genes have been characterized. The genes features include 40% lipids, --
--% carbohydrate and ----% of ….Importantly, roughly 10% of the coding capacity is taken up
by the PE/PPE gene families that encode acidic, glycine-rich proteins. These proteins have a
conserved N-terminal motif, deletion of which impairs growth in macrophages and
Showing Page:
2/8
granulomas.[35] There is considerable interest in this gene family for regulation of immune cells,
as well as there role in the clinical manifestation of TB and LTBI. A better understanding of the
genetic variability of this gene family would potentially shed light on their role as virulence
factors.
One of the surprises emerging from the analysis of the first sequenced M. tb genome (the
laboratory strain H37Rv) was the discovery of two large gene families, designated pe and ppe,
that in H37Rv comprise 99 and 69 members respectively and together account for around 10% of
the organism's genomic coding potential [5]. Pe genes are characterised by the presence of a
proline-glutamic acid (PE) motif at positions 8 and 9 within a highly conserved N-terminal
domain consisting of around 110 amino acids. Similarly, ppe genes contain a proline-proline-
glutamic acid (ppe) at positions 79 in a highly conserved N-terminal domain of approximately
180 amino acids. The C-terminal domains of both pe and ppe protein families are highly variable
in both size and sequence and often contain repetitive DNA sequences that differ in copy number
between genes [5].
The pe and ppe gene families can be divided into sub-families based on similarities in their N-
terminal regions and the phylogenetic relationships between each gene sub-family have been
previously described, demonstrating that their evolutionary expansions are linked to the
duplications of the ESAT-6 (esx) gene clusters [8]. Ppe genes can be subdivided into 5
subfamilies, the most numerous of which are the ppe_svp (24 members) and the ppe_mptr
(major polymorphic tandem repeat) subfamilies (23 members) (Fig. 1a). Pe genes can also be
divided into 5 sub-families, the largest of which, the polymorphic GC-rich-repetitive sequence
(pe_pgrs), comprises 65 members in H37Rv (Fig. 1b). This sub-family is characterised by a C-
terminal domain that contains multiple tandem repeats of a glycine-glycine-alanine (Gly-Gly-
Ala) or a glycine-glycine-asparagine (Gly-Gly-Asn) motif. Phylogenetic analysis indicates that
the emergence of the large pe_pgrs and ppe_mptr subfamilies is a recent evolutionary event, with
their presence being restricted to members of the MTBC and close relatives such as M. marinum
and M. ulcerans [8].
To understand the evolutionary relation of PE AND PPE genes in M.tb , a phylogenetic tree
based on multiple sequence alignment of all the proteins encoded by members of the two gene
families was constructed (Figure, 3). Tree was constructed from ninety-six PE protein family N-
terminal sequence. Also, PPE protein from ESAT-6 (esx) gene cluster region 1, (Rv3873) was
chosen as an out-group. The tree indicated same typology that was conserved when the complete
protein sequence were used for analysis.
Showing Page:
3/8
Figure 2. A schematic demonstration of PE/PPE family sub-groups.
PE and PPE proteins have comparatively conserved N-terminal domains. Each one of these
proteins has a chance to split into discrete sub groups based on their variable c-terminal domains.
The major participants to the genome polymorphism are the regions encoded by the polymorphic
GC-rich sequence (PGRS) which is the PE family and the major polymorphic tandem repeats
(MPTR).
The PE-PGRS gene family is associated with membrane pore formation and protease processing,
antigen secretion for T-cell. This PE_PGRS gene is unique to MTB, PE_PGRS family.
Additionally, it has been widely proposed as molecular mantra to deflect host immunity (11).
These genes are interesting because the orthologue of the gene in the closely-related genome of
M.bovis is a pseudogene, the absence of which could potentially play a role in influencing host
or tissue tropism.
Figure 2. Phylogenetic trees of the PE and PPE proteins, respectively, present within the ESAT-6
(esx) gene clusters in M. tuberculosis H37Rv, demonstrating a duplication order similar to that
observed with other genes in the M. tuberculosis ESAT-6 (esx) gene cluster regions [1]. (A) PE
proteins, (B) PPE proteins (C) ESAT-6/CFP-10 proteins.
PE/PPE gene pairs are frequently associated with the ESAT-6 (esx) gene clusters in
M.tuberculosis. Region 1,3,2 and 5 from ESAT-6 represent ancestral genes of the two families
(PE/PPE).From the figure one can learn that the tree topologies correspond to each other, it also
suggests a co-evolutionary history for the two gene families. The evolutionary scenario is also
congruent with the evolutionary history determined for the five ESAT-6 (esx) gene clusters, with
Showing Page:
4/8
duplication events of PE and PPE genes contained and associated with these regions expanding
sequentially from region 1 to 3, 2 and lastly region 5. The topology of the phylogenetic trees
suggests that the PE_PGRS and the PPE-MPTR subfamilies are the result of the most recent
evolutionary events and have evolved from the sublineage that includes the ESAT-6 (esx) gene
cluster region 5 PE and PPE genes.
Region 4 is also present outside the genus Mycobacterium and its absence in the most ancestral
ESAT 6 region indicate its integration into duplicate of this region and the co duplicate with rest
of the regions. Phylogenetic tree analysis between the genes of PE family and ESAT 6 supports
the co-duplication hypothesis of them (12). The PE genes are situated in region 5 of the cluster
and their presence in multiple copies indicates this region is highly prone to duplication (Figure 2
).
Complete genome sequencing of an organism provides a wealth of information concerning
phenotype and evolution and it can be used to trace the evolution of genes and gene families
such as the expansion of PE family in ESAT-6 gene cluster region. To construct the phylogenetic
tree and a robust evolutionary history, it is important to identify the most ancestral representative
of the family. To identify the orthologues gene of the family, a tool called comparative genomics
is used which compare genomes of different species to look at their differences and similarities.
In case of PE_PGRS, the ESAT-6 cluster region 1 is most ancestral and region 5 has evolved
recently (15). The regions represent PE/PPE proteins present within the ESAT-6 (esx) gene
clusters in M. tuberculosis H37Rv.
The motifs proline and glutamine is found near the N-terminal in most cases. The conserved N-
terminal region is characterized by a proline-glutamic acid (PE) motif at 8 and 9 positions of
approximately 110 amino acids and C-terminal region that varies in size, sequence and repeat
number. PE family is subdivided into several subfamilies of which PGRS is the largest with 65
members (20). Tandem repeats in C-terminal domain these subfamilies are glycine-glycine-
alanine (Gly-Gly-Ala) or glycine-glycine-asparagine (Gly-Gly-Asn). Other PE subfamily i.e.,
with 34 members show low homology in their C-terminal domain (21).
Previous demonstrations on members of the PPE-family have strongly indicated that they were
likely to be cell wall associated, with certain PE_PGRS proteins being shown to be cell-surfaced
constituents that influence the cellular structure and colony morphology, also the demonstration
indicates to that remarkable length and sequence variation of the PE-PGRS is due to lengthy
stretches of GC-rich sequence. This sequence can be intercalated by regions of diverse sequence
up to 60 amino acids long. Due to the repetitive sequence and redundancy in PGRS domain, the
PE-PGRS genes are thought to be hot spot for recombination and single nucleotide
polymorphisms (SNPs). Several analyses of this PE-PGRS gene show that clinical isolates of the
MTBC harbor polymorphism in this gene. In 68% clinical isolates PE_PGRS33 have been found
to show high sequence variation. The driving force behind the sequence diversity may be the
immune recognition by the host (22).
PE proteins inhibit antigen processing though interaction with the immune system. A report in
support of this states that DNA vaccine construct based on the N-terminal region of PE-PGRS
Showing Page:
5/8
can elicit immune response but not the whole PE-PGRS region. It also suggests the ability to
influence antigen presentation and processing of the PGRS repeat region. There is also evidence
of virulence of PE_PGRS region in other non-pathogenic Mycobacterium species. In non-
pathogenic fast-growing M. smegmatis, PE_PGRS region helps to survive in infected
macrophage. PGRS region is associated bacterial shape and colony morphology and necessary
for subcellular localization (21,22).
One of the ways that PE and PPE proteins may interact with the host immunity is through
inhibition of antigen processing, a process that can be supported by a PE_PGRS protein being
able to elicit a cellular immune response, from a DNA vaccine construct based on conserved N-
terminal PE region whereas a construct containing PE_PGRS is unable to do so. This shows that
the PGRS repeats are somehow capable of influencing antigen processing and presentation. (23).
Sequencing of complete genome of organisms has provided a lot of information regarding the PE
and the PPE gene evolution and distribution of the genus Mycobacterium. Prior to it, little was
known about them and their association with the ESAT-6 gene. With the information obtained
from sequencing technique, it is now possible to trace the evolution of gene and gene families.
(23).
With the PE and PPE genes being only found in the genus Mycobacterium, analyzing them
greatly helps us understand M. tuberculosis. The ESAT-6 (esx) gene clusters in M. tuberculosis
have PE/PPE gene pairs frequently associated with them with analysis from constituent genes
displaying duplication order from ancestral region 4 (Rv3444c-Rv3450c) to region 1 (Rv3866-
Rv3883c), 3 (Rv0282-Rv0292), 2 (Rv3884c-Rv3895c), and lastly to region 5 (Rv1782- Rv1798)
(22,23).
Figure 3. Phylogenetic reconstruction of the evolutionary relationships between the members of
the PE protein family.
The phylogenetic tree was constructed from the phylogenetic analyses done on the 110-amino
acid N-terminal domains of the PE proteins. The tree was rooted to the outgroup, Rv3872
(PE35), shown to be the first PE insertion into the ESAT-6 (esx) gene clusters (region 1). The
genes highlighted in purple, green and blue are present in ESAT-6 (esx) gene cluster region 1, 3
and 2, respectively. Genes highlighted in red are present in or have been previously shown to be
duplicated from ESAT-6 (esx) gene cluster region 5 [1] and genes highlighted in yellow are
members of the PGRS subfamily of the PE family. Arrows indicate orthologues of genes
identified to be present within the M. smegmatis genome sequence. Five sublineages (including
the PE_PGRS subfamily) are indicated by Roman numerals.
There has been an increase of TB resistance to drugs over the years. This is so due to the
numerous evolution mechanisms developed by the M. tuberculosis to overcome, evade, or
exploit host immune responses. One such mechanism, antigenic variation, consists of generating
escape mutants that are not recognized by existing host molecules. This has led to the emergence
Showing Page:
6/8
of multidrug and extensively drug-resistant M.tb strains, that has prompted an urgent need for
new diagnostics and therapies (24)
There has been an amazing link between the PE/PPE families and the mycobacterial lipid
metabolism in which the M. tuberculosis, upon its entry into macrophage, undergoes
physiological adaptations to the intracellular environment which includes the upregulation of
lipid metabolism genes. It then scavenges the host's triacylglycerol (TAG) and incorporates these
fatty acids into their storage bodies or converts them to mycobacterial lipids, utilizing up the
lipids inclusions in times of starvation, oxygen depletion or pathogen reactivation. (25). Upon
starvation, the M. tuberculosis upregulates the expression of lipase genes, including two
members of the PE family, namely LipYV (PE_PGRS63V), a PE/PPE protein secreted by ESX-
5 with enzymatic properties namely TAG lipase activity, into the extracellular milieu that allows
it to cleave TAG into a diacylglycerol where it scavenges TAG for a nutrient source and LipXIV
(PE11IV). (25)
The M.tb infection is established upon its entry into the host, with several PE/PPE proteins being
linked to inhabitation and or/activation of the host macrophage activity. The M.tb virulence and
its ability to grow in a macrophage is as a result of ESX-5 secretion system together with
PE/PPE proteins, with PE5II, PE15II and PE4V proteins having the ability to increase
mycobacterial survival within the macrophage. (26)
It has been demonstrated that the PE/PEE proteins are involved in mycobacterial secretion
systems due to their frequent localization to the mycobacterial surface and these families contain
lipolytic enzymes, a part of the mycobacterial metabolome that has implications in its virulence.
This shows the relevance of the PE/PPE proteins to M.tb virulence as key components of the cell
wall that interacts with the host for survival. (25,26).
The PE_PGRS33 is one of the 60 PE_PGRS genes in the M.tb genome that encodes a surface-
expressed protein believed to be involved in the antigenic variation of M. tuberculosis strains and
evasion of the host immune system. No genetic variation has been conducted between the
PE_PGRS33 genes of H37Rv and CDC1551 even though genetic differences between their
genes has been noted (28).
Two of the PE_PGRS proteins (PE_PGRS17V and PE_PGRS18V) do have tandem repeats of
41-43 amino acid residues in the C-terminal variable regions. A study on the PE-PGRS showed
that T cell recognition plays an insignificant role in diversifying selection, with a closer
examination suggesting the highly immunogenic nature of selected family members being driven
by a high degree of immunogenic cross-reactivity (29).
Another investigation study was carried out on 123 clinical isolates of M.tb to gain better
understanding of the genetic basis of the role of PE_PGRS in antigen variation and evasion of
the host immune system. The PCR and DNA sequencing methods were employed and about
68% of the isolates were found to have at least one sequence variation in the PE_PGRS 33 gene,
relative to that of H37Rv (30).
Since the functions of PE-PGRS proteins remains unknown. There are many experimental
studies related to the PE and PPE families which made us to speculate that they may contribute
Showing Page:
7/8
to the enhancement of genetic variation which is a special mechanism consist of generating
escape mutants that are not recognized by existing host molecules. Genetic variation in MTB
usually occurs through single nucleotide polymorphisms (SNPs) (32). The aim of this study was
to determine the genetic diversity of M.tb clinical isolates and to analyze the variation dynamics
of the M.tb genome by choosing multiple genome alignment of five selected sequence strains.
This approach gives insight into the evolution of M.tb genome and the adaptation of the general
process of this pathogen to its human host.
One member of the PE_PGRS wag22 which is a fibronectin binding protein and analysis result
in uninformative unrooted tree indicating complex evolutionary history of this gene family. The
PE-PGRS gene is variable in PGRS domain which can associate with antigenic drift. These
genes also encode protein for cell wall which can undergo extensive variation (33). This can also
lead to the antigenic variation which makes them resistance to drug and the development of the
multi-drug resistance strains. The PE domain contains conserved sequence which encodes
protein or epitopes for recognition and the binding with T cells (32,33).
Sequence or nucleotide duplication and deletion, in-frame or frameshift insertions or deletions,
transposable genetic elements insertions, alternative start codons, single nucleotide
polymorphisms (SNPs) often lead to the variation in the sequence of gene. PE_PGRS genes
share extensive homologous regions and this is result of intergenically spontaneous homologous
recombination. Insertions elements play an apparent role in functional attribute of the
mycobacterial genome, as the insertion of IS6110 elements in PE region drive mutation, genetic
variation that could contribute to phenotypic changes (34). Studies have demonstrated significant
genetic variation of PE_PGRS gene in comparison with the whole genome. The difference in a
great extent in respect with the nucleotide and insertions/deletions diversity in individual
PE_PGRS gene is the indication of neutral diversity. Along with these dN/dS ratios are also
useful in determination of diversity in gene. Extensive diversity among nucleotides of genes
suggests purifying and diversifying selection to distribute distinct functional roles over the genes.
These regions are hotspot of recombination and insertion and deletion of sequence and this also
helps the organism to evade from the host immune system and confirm its survival and disease
progression. Sequence diversity also promotes addictiveness against a wide range of population
instead of an individual host. These regions are also divergently regulated in individual strain.
Furthermore, Pathogens build up numerous mechanisms to restrain host immune responses. One
of the most popular mechanisms, antigenic variation which is the change in the surface antigen
of a microorganism, consists of generating escape mutants that are not recognized by existing
host molecules (35).
There are good experimental shreds of evidence to support the PE association with surface
exposure at the host-pathogen interface. A mycobacterial membrane or cell wall localization of
the PE protein family is supported by proteomic studies over 35 PE proteins (35.36). In respect
with the genomic perspective, expansion of the PE gene family from non-pathogenic to present
day pathogenic sub-lineages is the result of genetic variation which has been led to highly
polymorphic region. Hypervariable regions are present in most recent sub-lineages i.e., sub-
lineages IV & V (36).
Showing Page:
8/8
The PE-PGRS genes are proteins in the cell wall and cell membrane of M. tuberculosis that need
to be analyzed by the insertion and deletion which is contribute to the genetic diversity and can
be under selective pressure, which is a genetic mutation to insert or delete a certain segment of
DNA into an organism genome. However, despite the encouraging progress in minimizing the
mortality and incidence rates of TB, this disease remains a global health problem. Since the
identification of TB in 1882, many approaches took place to solve this disease, searching for a
distinct propriety diagnostic method; one of them is a sensitive, rapid and cost- effective
diagnostic platform that stills a challenge. Nowadays, the current TB detection methods mainly
employ sputum smear microscopy, culture of bacilli and molecular species diagnostic (36).
However, this identification method has a sensitivity of only approximately 50%, especially in
HIV-infected patients (36). Bacillus culture can be considered the standard method for TB
detection in a clinical center. The result cannot be obtained instantly based on MTB because the
bacillus requires 3-6 weeks for growth on solid culture media and 9-16 days using rapid liquid
culture media.
Moreover, test turnaround time including multiple hybridization processes requires 2-5 h, and the
requirement of sophisticated infrastructure and trained personnel limits the feasibility of applying
this test in the developing countries. With the boost of various nonmaterial syntheses in
combination with analytical plate forms AuNPs have been adopted as versatile biosensors to
provide alternative schemes to conventional detection methods for TB diagnosis (37). There is a
need to fully characterize PPE/PE family order multiplicity across strain-types to deliver better
sympathetic of these genes and their possible role in immune evasion and virulence. The obtain
ability of high quantity short sequencing technologies has developed the study of M. tuberculosis
genetic variety. To illustrate these subtle genes we have achieved whole genome assemblage on
upcoming generation arrangement data with a high depth of exposure across the ppe/pe gene
regions from 518 experimental isolates and clinical. These segregates represent the four major
families, each with recognized informative bar-coding SNPsv (38).
The study focused on studying several strains specifically sub-Saharan Africa region to examine
the genetic variability of the unique gene PE-PGRS. Hence, the study will illustrate some facts
about the variability and the virulence levels of the PE-PGRS gene specifically in Africa region
by using the DNA sequencing to determine the nucleotide diversity which is the measurement of
polymorphism within a population (39). Also, this approach will help us design an R script to
detect the binding site for each PE_PGRS gene to locate the aquerts position for binding .The
purpose of the investigation is to find in-depth knowledge about genomic structure of the M.tb
which is mainly about the mycobacterial PE/PPE. It has been widely estimated that these
proteins may play an important role in the equivocation of host immune responses, possibly via
antigenic variation. The PE-PGRS gene family is the most variable regions of the M.tb bacteria.
The variability of this region has been associated with increased levels of antigenicity. Therefore,
thought to be associated with evading its human host. The role of the PE-PGRS gene family in
increased virulence and pathogenicity in Sub-Saharan Africa is not well understood. We suspect
that the PE-PGRS gene family is contributing to the evasion of the human host immune system. I
believe that the strains that contains the PE-PGRS gene family has different antigenic variations
depending on their nucleotide diversity polymorphisms

Unformatted Attachment Preview

CHAPTER I. INTRODUCTION Exploratory Analysis of the Genetic Variability of the PE-PGRS Gene Overview The global burden of tuberculosis (TB) is staggering (almost 9 million cases per annum) (1). India and China have the largest incidence (This is the number of new cases of active TB disease ).India is estimated to have 2.2 million incident cases of TB in 2015 (World Health Organization (WHO), 2016)while China is the second largest TB epidermic with an estimated incidence of 930,000 recorded in 2014 http://www.tbfacts.org/china/ Visited 18/06/2017 . South Africa is one of the countries that has the largest burden (i.e. the largest number of persons infected per capita)of TB with an estimated incidence of 450,000 cases of active TB in 2013 (WHO, 2016). In South Africa, TB is the leading cause of death. The country also has the highest burden of HIV co-infections globally (estimated at approximately 7,03 million in 2016 ) (1). Consequently, individuals that are co-infected with HIV are driving the epidemic and account for more than 60% of the new TB cases (2). In HIV-infected persons, latent TB infection (LTBI) co-infection rates are high (> 80%) and progression from LTBI to active TB is occurring at a rate of up to 10% per annum, rather than 10% over one's lifetime (3). This persistent reservoir of infection is subverting eradication efforts and this is primarily because current tools to detect LTBI; as well as, differentiate it from subclinical or active TB, are sub-optimal (3). We also know very little about the fundamental biology of LTBI including the physical nature of LTBI and its transcriptional profile in human tissues (4). For instance, the granuloma is the characteristic hallmark of mycobacterial disease (5), and presumed LTBI is not often visually apparent. One report showed that viable mycobacteria were cultured from macroscopically normal tissue when inoculated into guinea pigs (5), and several worldwide reports that date back as early as 1900 (Otto Naegeli, 1900. Virchows Arch. Path. Anat., 160,426) showed that mycobacteria could be cultured from the lungs of individuals that died from other diseases. We also know from large controlled trials that treating TST+ve (tuberculin skin test – positive) persons without active TB reduces the development of active TB by 60 to 80% (5) but conversely the positive predictive value of tests like the IGRA (Interferon Gamma Release Assay) and TST for the development of active TB is only about 2 to 3% in HIV co-infected persons (6). Collectively, these data indicate that although TB infection is a spectrum of disease (7) the pathophysiology and associated biological processes of viable and persistent mycobacteria found in tissues that are not culturable remain unclear. Mycobacterium tuberculosis (M.tb) is an obligate pathogenic bacterial species in the family Mycobacteriaceae and it is caused by an agent known as TB. M.tb H37Rv strain genome was first published in 1998.[32] It estimated to be 4 million base pairs in size with approximately 3,959 genes; 40% of the genes have been characterized. The genes features include 40% lipids, ---% carbohydrate and ----% of ….Importantly, roughly 10% of the coding capacity is taken up by the PE/PPE gene families that encode acidic, glycine-rich proteins. These proteins have a conserved N-terminal motif, deletion of which impairs growth in macrophages and granulomas.[35] There is considerable interest in this gene family for regulation of immune cells, as well as there role in the clinical manifestation of TB and LTBI. A better understanding of the genetic variability of this gene family would potentially shed light on their role as virulence factors. One of the surprises emerging from the analysis of the first sequenced M. tb genome (the laboratory strain H37Rv) was the discovery of two large gene families, designated pe and ppe, that in H37Rv comprise 99 and 69 members respectively and together account for around 10% of the organism's genomic coding potential [5]. Pe genes are characterised by the presence of a proline-glutamic acid (PE) motif at positions 8 and 9 within a highly conserved N-terminal domain consisting of around 110 amino acids. Similarly, ppe genes contain a proline-prolineglutamic acid (ppe) at positions 7–9 in a highly conserved N-terminal domain of approximately 180 amino acids. The C-terminal domains of both pe and ppe protein families are highly variable in both size and sequence and often contain repetitive DNA sequences that differ in copy number between genes [5]. The pe and ppe gene families can be divided into sub-families based on similarities in their Nterminal regions and the phylogenetic relationships between each gene sub-family have been previously described, demonstrating that their evolutionary expansions are linked to the duplications of the ESAT-6 (esx) gene clusters [8]. Ppe genes can be subdivided into 5 subfamilies, the most numerous of which are the ppe_svp (24 members) and the ppe_mptr (major polymorphic tandem repeat) subfamilies (23 members) (Fig. 1a). Pe genes can also be divided into 5 sub-families, the largest of which, the polymorphic GC-rich-repetitive sequence (pe_pgrs), comprises 65 members in H37Rv (Fig. 1b). This sub-family is characterised by a Cterminal domain that contains multiple tandem repeats of a glycine-glycine-alanine (Gly-GlyAla) or a glycine-glycine-asparagine (Gly-Gly-Asn) motif. Phylogenetic analysis indicates that the emergence of the large pe_pgrs and ppe_mptr subfamilies is a recent evolutionary event, with their presence being restricted to members of the MTBC and close relatives such as M. marinum and M. ulcerans [8]. To understand the evolutionary relation of PE AND PPE genes in M.tb , a phylogenetic tree based on multiple sequence alignment of all the proteins encoded by members of the two gene families was constructed (Figure, 3). Tree was constructed from ninety-six PE protein family Nterminal sequence. Also, PPE protein from ESAT-6 (esx) gene cluster region 1, (Rv3873) was chosen as an out-group. The tree indicated same typology that was conserved when the complete protein sequence were used for analysis. Figure 2. A schematic demonstration of PE/PPE family sub-groups. PE and PPE proteins have comparatively conserved N-terminal domains. Each one of these proteins has a chance to split into discrete sub groups based on their variable c-terminal domains. The major participants to the genome polymorphism are the regions encoded by the polymorphic GC-rich sequence (PGRS) which is the PE family and the major polymorphic tandem repeats (MPTR). The PE-PGRS gene family is associated with membrane pore formation and protease processing, antigen secretion for T-cell. This PE_PGRS gene is unique to MTB, PE_PGRS family. Additionally, it has been widely proposed as molecular mantra to deflect host immunity (11). These genes are interesting because the orthologue of the gene in the closely-related genome of M.bovis is a pseudogene, the absence of which could potentially play a role in influencing host or tissue tropism. Figure 2. Phylogenetic trees of the PE and PPE proteins, respectively, present within the ESAT-6 (esx) gene clusters in M. tuberculosis H37Rv, demonstrating a duplication order similar to that observed with other genes in the M. tuberculosis ESAT-6 (esx) gene cluster regions [1]. (A) PE proteins, (B) PPE proteins (C) ESAT-6/CFP-10 proteins. PE/PPE gene pairs are frequently associated with the ESAT-6 (esx) gene clusters in M.tuberculosis. Region 1,3,2 and 5 from ESAT-6 represent ancestral genes of the two families (PE/PPE).From the figure one can learn that the tree topologies correspond to each other, it also suggests a co-evolutionary history for the two gene families. The evolutionary scenario is also congruent with the evolutionary history determined for the five ESAT-6 (esx) gene clusters, with duplication events of PE and PPE genes contained and associated with these regions expanding sequentially from region 1 to 3, 2 and lastly region 5. The topology of the phylogenetic trees suggests that the PE_PGRS and the PPE-MPTR subfamilies are the result of the most recent evolutionary events and have evolved from the sublineage that includes the ESAT-6 (esx) gene cluster region 5 PE and PPE genes. Region 4 is also present outside the genus Mycobacterium and its absence in the most ancestral ESAT 6 region indicate its integration into duplicate of this region and the co duplicate with rest of the regions. Phylogenetic tree analysis between the genes of PE family and ESAT 6 supports the co-duplication hypothesis of them (12). The PE genes are situated in region 5 of the cluster and their presence in multiple copies indicates this region is highly prone to duplication (Figure 2 ). Complete genome sequencing of an organism provides a wealth of information concerning phenotype and evolution and it can be used to trace the evolution of genes and gene families such as the expansion of PE family in ESAT-6 gene cluster region. To construct the phylogenetic tree and a robust evolutionary history, it is important to identify the most ancestral representative of the family. To identify the orthologues gene of the family, a tool called comparative genomics is used which compare genomes of different species to look at their differences and similarities. In case of PE_PGRS, the ESAT-6 cluster region 1 is most ancestral and region 5 has evolved recently (15). The regions represent PE/PPE proteins present within the ESAT-6 (esx) gene clusters in M. tuberculosis H37Rv. The motifs proline and glutamine is found near the N-terminal in most cases. The conserved Nterminal region is characterized by a proline-glutamic acid (PE) motif at 8 and 9 positions of approximately 110 amino acids and C-terminal region that varies in size, sequence and repeat number. PE family is subdivided into several subfamilies of which PGRS is the largest with 65 members (20). Tandem repeats in C-terminal domain these subfamilies are glycine-glycinealanine (Gly-Gly-Ala) or glycine-glycine-asparagine (Gly-Gly-Asn). Other PE subfamily i.e., with 34 members show low homology in their C-terminal domain (21). Previous demonstrations on members of the PPE-family have strongly indicated that they were likely to be cell wall associated, with certain PE_PGRS proteins being shown to be cell-surfaced constituents that influence the cellular structure and colony morphology, also the demonstration indicates to that remarkable length and sequence variation of the PE-PGRS is due to lengthy stretches of GC-rich sequence. This sequence can be intercalated by regions of diverse sequence up to 60 amino acids long. Due to the repetitive sequence and redundancy in PGRS domain, the PE-PGRS genes are thought to be hot spot for recombination and single nucleotide polymorphisms (SNPs). Several analyses of this PE-PGRS gene show that clinical isolates of the MTBC harbor polymorphism in this gene. In 68% clinical isolates PE_PGRS33 have been found to show high sequence variation. The driving force behind the sequence diversity may be the immune recognition by the host (22). PE proteins inhibit antigen processing though interaction with the immune system. A report in support of this states that DNA vaccine construct based on the N-terminal region of PE-PGRS can elicit immune response but not the whole PE-PGRS region. It also suggests the ability to influence antigen presentation and processing of the PGRS repeat region. There is also evidence of virulence of PE_PGRS region in other non-pathogenic Mycobacterium species. In nonpathogenic fast-growing M. smegmatis, PE_PGRS region helps to survive in infected macrophage. PGRS region is associated bacterial shape and colony morphology and necessary for subcellular localization (21,22). One of the ways that PE and PPE proteins may interact with the host immunity is through inhibition of antigen processing, a process that can be supported by a PE_PGRS protein being able to elicit a cellular immune response, from a DNA vaccine construct based on conserved Nterminal PE region whereas a construct containing PE_PGRS is unable to do so. This shows that the PGRS repeats are somehow capable of influencing antigen processing and presentation. (23). Sequencing of complete genome of organisms has provided a lot of information regarding the PE and the PPE gene evolution and distribution of the genus Mycobacterium. Prior to it, little was known about them and their association with the ESAT-6 gene. With the information obtained from sequencing technique, it is now possible to trace the evolution of gene and gene families. (23). With the PE and PPE genes being only found in the genus Mycobacterium, analyzing them greatly helps us understand M. tuberculosis. The ESAT-6 (esx) gene clusters in M. tuberculosis have PE/PPE gene pairs frequently associated with them with analysis from constituent genes displaying duplication order from ancestral region 4 (Rv3444c-Rv3450c) to region 1 (Rv3866Rv3883c), 3 (Rv0282-Rv0292), 2 (Rv3884c-Rv3895c), and lastly to region 5 (Rv1782- Rv1798) (22,23). Figure 3. Phylogenetic reconstruction of the evolutionary relationships between the members of the PE protein family. The phylogenetic tree was constructed from the phylogenetic analyses done on the 110-amino acid N-terminal domains of the PE proteins. The tree was rooted to the outgroup, Rv3872 (PE35), shown to be the first PE insertion into the ESAT-6 (esx) gene clusters (region 1). The genes highlighted in purple, green and blue are present in ESAT-6 (esx) gene cluster region 1, 3 and 2, respectively. Genes highlighted in red are present in or have been previously shown to be duplicated from ESAT-6 (esx) gene cluster region 5 [1] and genes highlighted in yellow are members of the PGRS subfamily of the PE family. Arrows indicate orthologues of genes identified to be present within the M. smegmatis genome sequence. Five sublineages (including the PE_PGRS subfamily) are indicated by Roman numerals. There has been an increase of TB resistance to drugs over the years. This is so due to the numerous evolution mechanisms developed by the M. tuberculosis to overcome, evade, or exploit host immune responses. One such mechanism, antigenic variation, consists of generating escape mutants that are not recognized by existing host molecules. This has led to the emergence of multidrug and extensively drug-resistant M.tb strains, that has prompted an urgent need for new diagnostics and therapies (24) There has been an amazing link between the PE/PPE families and the mycobacterial lipid metabolism in which the M. tuberculosis, upon its entry into macrophage, undergoes physiological adaptations to the intracellular environment which includes the upregulation of lipid metabolism genes. It then scavenges the host's triacylglycerol (TAG) and incorporates these fatty acids into their storage bodies or converts them to mycobacterial lipids, utilizing up the lipids inclusions in times of starvation, oxygen depletion or pathogen reactivation. (25). Upon starvation, the M. tuberculosis upregulates the expression of lipase genes, including two members of the PE family, namely LipYV (PE_PGRS63V), a PE/PPE protein secreted by ESX5 with enzymatic properties namely TAG lipase activity, into the extracellular milieu that allows it to cleave TAG into a diacylglycerol where it scavenges TAG for a nutrient source and LipXIV (PE11IV). (25) The M.tb infection is established upon its entry into the host, with several PE/PPE proteins being linked to inhabitation and or/activation of the host macrophage activity. The M.tb virulence and its ability to grow in a macrophage is as a result of ESX-5 secretion system together with PE/PPE proteins, with PE5II, PE15II and PE4V proteins having the ability to increase mycobacterial survival within the macrophage. (26) It has been demonstrated that the PE/PEE proteins are involved in mycobacterial secretion systems due to their frequent localization to the mycobacterial surface and these families contain lipolytic enzymes, a part of the mycobacterial metabolome that has implications in its virulence. This shows the relevance of the PE/PPE proteins to M.tb virulence as key components of the cell wall that interacts with the host for survival. (25,26). The PE_PGRS33 is one of the 60 PE_PGRS genes in the M.tb genome that encodes a surfaceexpressed protein believed to be involved in the antigenic variation of M. tuberculosis strains and evasion of the host immune system. No genetic variation has been conducted between the PE_PGRS33 genes of H37Rv and CDC1551 even though genetic differences between their genes has been noted (28). Two of the PE_PGRS proteins (PE_PGRS17V and PE_PGRS18V) do have tandem repeats of 41-43 amino acid residues in the C-terminal variable regions. A study on the PE-PGRS showed that T cell recognition plays an insignificant role in diversifying selection, with a closer examination suggesting the highly immunogenic nature of selected family members being driven by a high degree of immunogenic cross-reactivity (29). Another investigation study was carried out on 123 clinical isolates of M.tb to gain better understanding of the genetic basis of the role of PE_PGRS in antigen variation and evasion of the host immune system. The PCR and DNA sequencing methods were employed and about 68% of the isolates were found to have at least one sequence variation in the PE_PGRS 33 gene, relative to that of H37Rv (30). Since the functions of PE-PGRS proteins remains unknown. There are many experimental studies related to the PE and PPE families which made us to speculate that they may contribute to the enhancement of genetic variation which is a special mechanism consist of generating escape mutants that are not recognized by existing host molecules. Genetic variation in MTB usually occurs through single nucleotide polymorphisms (SNPs) (32). The aim of this study was to determine the genetic diversity of M.tb clinical isolates and to analyze the variation dynamics of the M.tb genome by choosing multiple genome alignment of five selected sequence strains. This approach gives insight into the evolution of M.tb genome and the adaptation of the general process of this pathogen to its human host. One member of the PE_PGRS wag22 which is a fibronectin binding protein and analysis result in uninformative unrooted tree indicating complex evolutionary history of this gene family. The PE-PGRS gene is variable in PGRS domain which can associate with antigenic drift. These genes also encode protein for cell wall which can undergo extensive variation (33). This can also lead to the antigenic variation which makes them resistance to drug and the development of the multi-drug resistance strains. The PE domain contains conserved sequence which encodes protein or epitopes for recognition and the binding with T cells (32,33). Sequence or nucleotide duplication and deletion, in-frame or frameshift insertions or deletions, transposable genetic elements insertions, alternative start codons, single nucleotide polymorphisms (SNPs) often lead to the variation in the sequence of gene. PE_PGRS genes share extensive homologous regions and this is result of intergenically spontaneous homologous recombination. Insertions elements play an apparent role in functional attribute of the mycobacterial genome, as the insertion of IS6110 elements in PE region drive mutation, genetic variation that could contribute to phenotypic changes (34). Studies have demonstrated significant genetic variation of PE_PGRS gene in comparison with the whole genome. The difference in a great extent in respect with the nucleotide and insertions/deletions diversity in individual PE_PGRS gene is the indication of neutral diversity. Along with these dN/dS ratios are also useful in determination of diversity in gene. Extensive diversity among nucleotides of genes suggests purifying and diversifying selection to distribute distinct functional roles over the genes. These regions are hotspot of recombination and insertion and deletion of sequence and this also helps the organism to evade from the host immune system and confirm its survival and disease progression. Sequence diversity also promotes addictiveness against a wide range of population instead of an individual host. These regions are also divergently regulated in individual strain. Furthermore, Pathogens build up numerous mechanisms to restrain host immune responses. One of the most popular mechanisms, antigenic variation which is the change in the surface antigen of a microorganism, consists of generating escape mutants that are not recognized by existing host molecules (35). There are good experimental shreds of evidence to support the PE association with surface exposure at the host-pathogen interface. A mycobacterial membrane or cell wall localization of the PE protein family is supported by proteomic studies over 35 PE proteins (35.36). In respect with the genomic perspective, expansion of the PE gene family from non-pathogenic to present day pathogenic sub-lineages is the result of genetic variation which has been led to highly polymorphic region. Hypervariable regions are present in most recent sub-lineages i.e., sublineages IV & V (36). The PE-PGRS genes are proteins in the cell wall and cell membrane of M. tuberculosis that need to be analyzed by the insertion and deletion which is contribute to the genetic diversity and can be under selective pressure, which is a genetic mutation to insert or delete a certain segment of DNA into an organism genome. However, despite the encouraging progress in minimizing the mortality and incidence rates of TB, this disease remains a global health problem. Since the identification of TB in 1882, many approaches took place to solve this disease, searching for a distinct propriety diagnostic method; one of them is a sensitive, rapid and cost- effective diagnostic platform that stills a challenge. Nowadays, the current TB detection methods mainly employ sputum smear microscopy, culture of bacilli and molecular species diagnostic (36). However, this identification method has a sensitivity of only approximately 50%, especially in HIV-infected patients (36). Bacillus culture can be considered the standard method for TB detection in a clinical center. The result cannot be obtained instantly based on MTB because the bacillus requires 3-6 weeks for growth on solid culture media and 9-16 days using rapid liquid culture media. Moreover, test turnaround time including multiple hybridization processes requires 2-5 h, and the requirement of sophisticated infrastructure and trained personnel limits the feasibility of applying this test in the developing countries. With the boost of various nonmaterial syntheses in combination with analytical plate forms AuNPs have been adopted as versatile biosensors to provide alternative schemes to conventional detection methods for TB diagnosis (37). There is a need to fully characterize PPE/PE family order multiplicity across strain-types to deliver better sympathetic of these genes and their possible role in immune evasion and virulence. The obtain ability of high quantity short sequencing technologies has developed the study of M. tuberculosis genetic variety. To illustrate these subtle genes we have achieved whole genome assemblage on upcoming generation arrangement data with a high depth of exposure across the ppe/pe gene regions from 518 experimental isolates and clinical. These segregates represent the four major families, each with recognized informative bar-coding SNPsv (38). The study focused on studying several strains specifically sub-Saharan Africa region to examine the genetic variability of the unique gene PE-PGRS. Hence, the study will illustrate some facts about the variability and the virulence levels of the PE-PGRS gene specifically in Africa region by using the DNA sequencing to determine the nucleotide diversity which is the measurement of polymorphism within a population (39). Also, this approach will help us design an R script to detect the binding site for each PE_PGRS gene to locate the aquerts position for binding .The purpose of the investigation is to find in-depth knowledge about genomic structure of the M.tb which is mainly about the mycobacterial PE/PPE. It has been widely estimated that these proteins may play an important role in the equivocation of host immune responses, possibly via antigenic variation. The PE-PGRS gene family is the most variable regions of the M.tb bacteria. The variability of this region has been associated with increased levels of antigenicity. Therefore, thought to be associated with evading its human host. The role of the PE-PGRS gene family in increased virulence and pathogenicity in Sub-Saharan Africa is not well understood. We suspect that the PE-PGRS gene family is contributing to the evasion of the human host immune system. I believe that the strains that contains the PE-PGRS gene family has different antigenic variations depending on their nucleotide diversity polymorphisms Name: Description: ...
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.
Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4