Description
i want the writer to write a thesis for me i will upload a file that describes everything i want and when i finish all the results i will upload them too. if you have any question please send me an email.
Unformatted Attachment Preview
Purchase answer to see full attachment
Explanation & Answer
Kindly see attached file.
The file is divided into:
-
Title page
Table of contents
Acknowledgement
Abstract
Introduction
Materials and methods
Results
Discussion of results
Conclusions
References
(TITLE)
(NAME)
Master Thesis
(UNIVERSITY)
(DATE)
Table of contents
Acknowledgements ....................................................................................................................... 3
Abstract ......................................................................................................................................... 4
1. Introduction............................................................................................................................... 5
1.1. What is molecular modelling?............................................................................................ 5
1.2. What are the steps involved in molecular modelling for the assessment of the docking
reaction between peptides and proteins? ................................................................................... 7
1.3. Previous research: Development of a benchmark data set for the assessment of peptide
docking by LEADS-PEP ......................................................................................................... 11
2. Objectives ................................................................................................................................ 16
3. Methods .................................................................................................................................. 17
3.1. Materials .......................................................................................................................... 17
3.2. Assessment of docking in the LEADS-PEP benchmark data set by Vina molecular
modelling method ................................................................................................................... 17
3.3. Assessment of docking in the LEADS-PEP benchmark data set by Plants molecular
modelling method ................................................................................................................... 18
4. Results ..................................................................................................................................... 19
5. Discussion of the results.......................................................................................................... 25
6. Conclusions.............................................................................................................................. 32
References................................................................................................................................... 33
Acknowledgements
I would like to thank the University of …. for having
provided me with all the necessary means to carry out the
research presented in the present Master Thesis. Furthermore, I
would like to thank my tutor …. for his/her guidance and support
throughout the whole process.
I am especially thankful to my family and friends for their
constant support and encouragement during this year and all my
academic background.
Finally, I would like to thank the …. for the financing
assistance provided, since it has enabled me carry out the work
described in the current Master Thesis.
Abstract
Two different modelling programs, namely VINA and PLANTS, have been
analyzed for their ability to predict the docking behavior of small peptides contained in
the LEADS-PEP database.
The comparison of the obtained results enabled us to verify previously published
results according to which the VINA software seems to be especially useful at
evaluating the docking of small peptides even if it had not initially been developed for
such purpose. In this regard, the obtained RMSD values clearly indicated that the VINA
modelling software provided accurate results in 45 out of the 53 peptides analyzed in
this study. In contrast, the PLANTS modelling software was able of only describing the
docking behavior of 28 out of the 53 analyzed peptides.
1. Introduction
1.1. What is molecular modelling?
Molecular modelling is the discipline of science that uses computers to build and
develop the structure of chemical molecules by using several models based on quantum
mechanics theories. In this sense, most of these models are based on the solution of the
Schrödinger differential equation presented below:
𝐻𝛹 = 𝐸𝛹
Where H represents the Hamiltonian operator, Ψ is the wavelength function and E is the
energy.
The solution of such equation is relatively simple in the case of the hydrogen
atom, but becomes highly complicated in the case of complex molecules. Taking this
into account, several approximations have been done in order to calculate the different
solutions for complex systems like proteins and small peptides as the one analyzed in
the present thesis.
Molecular modelling enables scientists to predict not only the structure of a
given molecule, but also its reactivity under pre-established environments. This
constitutes a very easy method to validate the results obtained from molecular
modelling. Hence, we can predict a given characteristic such as the acidity of the
molecule, and compare this result with the result obtained from an acid-base titration
experiment. In this regard, it should be noted that since several approximations are used
during the modelling process, only an approximate result will be obtained in most cases.
However, the possibility of such validation of the calculated results represents a
critical point to decide on which is the best calculation method for the exact application.
In this sense, the common practice consists on modelling the chemical structure of the
molecule through different approximation methods and then comparing the results to
evaluate which provides a better description of the experimental observed variable.
Molecular modelling techniques are commonly used in physical chemistry,
inorganic chemistry or biochemistry. They have extensively been applied to predict the
structure and reactivity of small organic molecules, polymers, inorganic solids,
inorganic liquids, liquid crystals and proteins. In the field of biochemistry, molecular
modelling techniques are especially useful at establishing the conformational analysis of
proteins, peptides and enzymes, among others. Hence, they can be used to predict the
sequence of amino acids or the tridimensional structure of the active site.
The main attractiveness of such models is their use in the design of new drugs.
The application of molecular modelling to drug design, however, has only been possible
after the exponential increase in the well-known structures obtained for different
proteins and enzymes by techniques such as nuclear magnetic resonance over the five
decades (Meyer, Swanson & Williams, 2000). In this regard, the inclusion of such
structures into different databases has enabled scientists to access the required
information in a timely manner instead of having to purify the protein and characterize
its structure.
Furthermore, the advances in informatics experienced in the last years have
enabled:
The exponential decrease in the required time to perform each calculation
The decrease in the computer requirements, meaning that anybody can
nowadays do a simple calculation without needing access to especially
potent computers
1.2. What are the steps involved in molecular modelling for the assessment of the
docking reaction between peptides and proteins?
Basically, the procedure used in molecular modelling for the assessment of the
complexation or docking reaction between peptides and proteins can be summarized as
follows:
1. Identification and characterization of the target proteins: In this stage, it is
important that the target proteins are identified and characterized through
molecular modelling to obtain their three-dimensional structure (Figure 1) and,
most importantly, the sequence of amino acids present in the active site of the
protein and to which the peptide will need to bind.
Figure 1. Sample of a three-dimensional structure of a protein obtained through
molecular modelling (Biologii.net, n.d.)
2. Once that the target proteins have been identified, possible peptides are selected
from the available database according to the identified sequence of amino acids
and structure of the active site in the protein. In this sense, it is important to
consider the chemical reactivity between the different functional groups present
in the amino acids. Hence, we should look for the presence of cysteine amino
acids that might form an intermolecular disulfide bond between the protein and
the peptide or the presence of acid and basic amino acids, as they could react
through an acid-base reaction and form a nearly ionic bond. The formation of
these bonds will significantly increase the stability of the complex, therefore
leading to a higher affinity docking process. Additionally to the chemical
reactivity, we should look for possible complementarity between the structure of
the peptide and the protein (Figure 2).
Figure 2. Dock formed between the protein (green) and the peptide (purple) (Huang,
2014)
3. Once that a set of possible target peptides that can favorably interact with the
protein have been identified, their three-dimensional structure is either searched
in the database (if available) or modelled. In this stage, special attention is paid
to the modelling of the active point of the peptide that will interact with the
protein, as well as any other surrounding amino acids that can pose steric
repulsion and destabilize the protein-peptide complex.
4. When the structures of both the protein and the possible binding peptides are
well characterized, the docking process is modelled. This can be accomplished
by using different tools and software. The objective of these programs is to
model a cocrystallized peptide-protein and the characterization of the binding
force between the peptide and the protein by analyzing the energy changes in the
system when considering the crystallized peptide and protein separately and the
cocrystallized mixture.
5. The data obtained from the modelling by applying the above procedure is
interpreted by the calculation of the root mean square deviation or RMSD. As in
any regression model, the calculation of the RMSD involves the comparison of
the predicted position of the different atoms in the model with the experimental
one by considering the crystallographic and RMN data present in the database.
Taking this into account, the best model would be that providing the lowest
RMSD value, as this would imply that it is able of calculating the exact position
of the different atoms present in the binding site of both the peptide and the
protein.
6. Finally, the obtained results are validated through a different set of experiments.
In this regard, a common practice would be that of selecting the two or three
peptides that had provided the highest interaction with the protein according to
the model, and measuring the binding constant by preparing different mixtures
of peptide and protein and monitoring the change in any physical or chemical
property through the reaction. As an example, if the developed model predicts
that the interaction between the peptide and the protein take place through an
acid base reaction, the change in the pH of the solution could be used to control
the docking reaction from an experimental point of view.
1.3. Previous research: Development of a benchmark data set for the assessment of
peptide docking by LEADS-PEP
Hauser and Windshügel (2016) have recently used molecular modelling to
evaluate the LEADS-PEP benchmark dataset for the assessment of the docking
performance of 53 different protein-peptide complexes. The formation of such
complexes is important from the biological point of view since these complexes
regulate several essential cellular processes. Since they have been estimate to modulate
around the 40% of the cell’s metabolism, their formation or dissociation is currently
being used as the basis of the peptide-based therapeutics for the treatment of e.g. cancer,
hepatitis C or metabolic diseases.
Taking this into account, the work developed by Hauser and Windshügel is a
very good example of how molecular modelling can assist scientists at developing new
drugs that are more effective and have less complications with secondary effects. In this
sense, the interaction between the peptide (designed drug) and the protein in the cell is
similar to that existing between the enzyme and its substrate or between an antibody and
its antigen. This interaction is therefore presumed to be highly specific and very stable.
However, and despite the importance that research on this field would have on
the development of a more systematic drug design protocol, very few experiments have
been carried out so far. In this sense, it should be noted that even while there are
currently several programs able of analyzing peptide docking data as the ones that will
be used in the present work, the comparison of the performance of these programs using
a unique dataset is still missing. Additionally, the fact that the databases are not publicly
available represents one of the most important drawbacks of the application of
molecular modelling to the evaluation of the protein-peptide docking process, as it
limits the comparability of the obtained results by the different modelling methods.
Considering, however, that the molecular modelling has proven a valuable
technique for the modelling of complexation reactions in small organic and inorganic
molecules, scientists expect that similar results should be obtained in the evaluation of
the peptide docking processes.
In this study, Hausser and Windshügel evaluated a total of 53 different peptides
of well-characterized sequences contained in the LEADS-PEP database. This database
has the advantage of being publicly available. The peptides were selected to cover the
full range of acid-base properties as evaluated through the H-bond donor and H-bond
acceptor properties of the different amino acids present in the peptide’s structure. Doing
so enabled the researchers to evaluate the influence of the pH on the stability of the
different docks formed between the peptides and the protein.
The peptide length in this study has been limited to a maximum of 12 amino
acids since according to the experiments carried out the computing time increased
exponentially with the increase in the length of the peptide due to the higher number of
atoms that where introduced in the model (Figure 3). Additionally to the peptide’s
length, the computing time also depended on the exact model being used. This can be
attributed to the number of simplifications being done, such that simpler models tend to
require of less computing time, but generally provide less accurate results.
Figure 3. Effect of peptide length and model on the computing time required to analyze
the structure and stability of the peptide dock to the target protein (Hauser &
Windshügel, 2016)
Another important result found by these authors is that the peptide’s length
significantly affects the accuracy of the obtained results. In this regard, Hauser and
Windschügel found that all the docking programs used in this experiment were able of
accurately modeling the backbone structure of short peptides (with just 3 or 4 residues)
but failed to accurately model longer peptides. Hence, and as can be observed from
figure 4, the RMSD values obtained for the different models increased with the peptide
length:
Figure 4. Average RMSD values obtained for the prediction of the structure of the
position of the backbone of the peptide by the AutoDock method using standard
accuracy. Similar performance was obtained for other modelling methods
According to the results from the study, the modelling method providing the
most accurate results was the Surflex method, that was able of providing an accurate
calculation of the position of 13 different peptide conformations out of the 53 total
peptides analyzed. In this regard, it should be noted, however, that the best modelling
method, considering the RMSD value, depended on the exact sequence of the peptide
being considered, such that the different peptides needed to be modelled by using all
different methods and then evaluate which provided the most accurate result. This
represents a very significant drawback, as the need to use several modelling methods
dramatically increases the required computing time.
It should be noted, however, that the use of the high accuracy mode did not play
a significant role in the RMSD values obtained in comparison with the standard
accuracy mode. Taking this result into account, and considering that the computing time
is greatly increased by selecting the high accuracy mode, the use of such mode does not
seem to be a valuable tool in the description of the docks formed by short peptides.
The main conclusions that can be derived from the work carried out by Hausser
and Windshügel (2016) can be summarized as:
The LEADS-PEP database fills in an existing gap in the collection of data
for assessing the performance of docking molecular modelling software.
The structure of peptide-protein docks contained in the LEADS-PEP
database has been characterized by using different modelling methods
Even while these methods were not originally designed for the docking
analysis of peptides, they seem to provide accurate results, most especially
for small peptides (of just 3 or 4 amino acid residues)
The use of the high accuracy mode using any of these modelling methods
does not provide more accurate results and significantly increases the
computing time. Taking this into account, the standard accuracy mode is
selected as optimal for the modelling of the selected peptide-protein docks.
Each particular dock should be modelled by all the available methods before
being able to decide on which provides the most accurate results. Taking this
into account, further research is necessary in this field to validate the
obtained results.
2. Objectives
Taking this into account, and using the work carried out by Hausser and
Windshügel (2016) as a starting point, the main objective of the work carried out during
my Master project have been that of testing the affinity between peptides and proteins
by molecular modelling. To do so, I have focused on:
Evaluation of which of the programs (VINA or PLANTS) provides the most
accurate results when assessing the docking between peptides and proteins
Comparison of the results obtained by each of the programs, and compare
these results with the previously published work developed by Hausser and
Windshügel
Trying to understand how the peptide is docked to the protein by using each
molecular modelling software and calculating the respective RMSD values.
3. Methods
(You would need to include here the exact settings used in the Plants method. You hadn’t
uploaded the protocol to be able of writing it. Additionally, you should include information on
how you have measured the different values presented in the tables in the results section.
Example, which method did you use to estimate the binding affinity? This is especially
important considering that all your binding affinities are lower than 0, which is nonsense from
a chemical point of view so you will most probably be asked about it)
3.1. Materials
The material used in these experiments consists on:
Vega ZZ computer program
Protein databases: RCSB PDB and LEADS-PEP
Vina modelling software
Plants modelling software
3.2. Assessment of docking in the LEADS-PEP benchmark data set by Vina
molecular modelling method
The method used in the LEAD-PEP analysis of the docking of proteins and
peptides database using VINA software can be summarized in the following steps:
1. Select a protein and/or peptide from the LEAD-PEP database
2. Open the selected protein using the Vega ZZ software
3. Remove the water molecules from the system
4. Run the receptor docking using VINA with the target protein or peptide
5. Open the selected inhibitor molecule and run the ligand docking using VINA
6. Run VINA docking with the receptor (protein) and ligand (inhibitor) using
XYZ.
The exact scripts to follow these instructions using the Vega ZZ software are:
To open the peptide, click on File > Run script > Docking > Vina > Ligand >
Run
To open the receptor, click on File > Run script > Docking > Vina >
Receptor.c > Run
To open a reference pep, go to the center of the peptide with the pointer of
the mouse, right click to obtain the center’s coordinates (xyz), click on the
center and choose chain - change – atom. A table will appear with the xyz
coordinates of the alpha carbon (designed as CA) of the center of the target
peptide.
Once that the xyz coordinates of the protein, ligand and reference peptide have
been obtained, the docking process is started by using the xyz coordinates obtained for
the reference peptide. The docking process has be done on the different peptides
contained in the LEADS-PEP database, selecting the best nine peptides that have a
higher affinity for the receptor and are closest pose to the reference peptide. The
obtained files are then saved using the “.mol2” extension.
The files obtained for the docking results using the selected nine peptides are
then merged using the Vega ZZ software and the RMSD is calculated for the different
files to determine which is the best model. As a rule of thumb, a good model will
present an RMSD value of at most 2.50A, being better for lower RMSD values.
3.3. Assessment of docking in the LEADS-PEP benchmark data set by Plants
molecular modelling method
(Kindly upload this information)
4. Results
Tables 1 and 2 present the results obtained from the modelling of the docking of
the different peptides using both VINA and PLANTS software.
Table 1: Results obtained using VINA
Number of
PDB
Sequence
residues
Best
Best
Best
ranking
scoring
RMSD
total score
pose
pose
3
1B9J
KLK
-140.148
2.04
0.3
3
2OY2
IAG
-42.373
0.8
-1.2
3
3GQ1
WLF
-70.58
1.9
0.75
3
3BS4
NIF
-111.904
2.6
-0.6
3
2OXW
IAG
-44.521
2.1
-0.59
3
2B6N
APT
-62.9332
4.76
-1.22
4
1TW6
AVPI
-46.159
1.8
-1.1
4
3VQG
VTLV
-88.37
3.9
-0.5
4
1UOP
GFEP
-39.97
3.2
0
4
4C2C
AVPA
-95.62
3.4
-0.9
4
4J44
AIAV
-80.02
2.9
-0.8
5
2HPL
DDLYG
-89.158
3.6
0
5
2V3S
GRFQV
-46.383
3.7
1.03
5
3NFK
GETRL
255.526
5.5
1.7
5
1NVR
ASVSA
-30.793
3.2
0.15
5
4V3I
DLTRP
-73.38
4.6
0.3
5
3T6R
ARTKQ
-112.35
1.6
1.7
6
1SVZ
PQFSLW
-43.61
3.3
2.5
6
3D1E
GQLGLF
-9.09
2.4
0.24
6
3IDG
ALDKWD
-49.53
4
1.03
6
3LNY
EQVSAV
-36.54
5
0.5
6
4NNM
YPTSII
-86.93
3.7
-1.3
6
4Q6H
VQDTRL
-85.6
4.7
-1.4
Number of
PDB
Sequence
residues
Best
Best
Best
ranking
scoring
RMSD
total score
pose
pose
7
3MMG
ETVRFQS
32.77
5.4
1
7
3Q47
NPISDVD
-91.19
3.5
0
7
3UPV
PTVEEVD
-96.5
4.7
0.7
7
4QBR
ARTKQTA
-86.61
4.7
-1.7
7
3NJG
PQIINRP
46.44
5.5
-0.54
8
1ELW
GPTIEEVD
-79.169
4.17
-2.7
8
3CH8
PQPVDSWV
-83.2148
5.6
0.47
8
4WLB
SLLKKLLD
-101.04
4.6
0.7
8
1OU8
GAANDENY
-98.4172
4.78
1.2
8
1N7F
ATVRTYSC
12.78
4.9
0.83
9
3OBQ
PTPSAPVPL
-85.1
3.8
-0.7
9
4BTB
PPPPPPPPP
-63.13
4.4
-1
9
2W0Z
APPPRPPKP
-21.159
1.3
3.31
9
4N7H
EAPPSYAEYAEV
37.92
2.27
1.3
9
2QAB
KILHRLLQD
824.587
4.5
-0.92
10
1H6W
SLNYIIKVKE
563
4.4
-0.5
10
3BRL
ATSAKATQTD
32.547
5.6
0.05
10
1NTV
NFDNPVYRKT
268.071
4.3
-0.8
10
4DS1
YAESGIQTDL
-78.7908
6.3
-0.1
10
2O02
GLLDALDLAS
794.143
5.9
-1.3
11
1N12
SDVAFRGNLLD
417.073
5.6
3.9
11
2XFX
VGYPKVKEEML
105.37
5.7
-2.8
11
3BFW
DSTITIRGYVR
417.077
5.6
0.8
11
4eIk
SLARRPLPPLP
168.116
3.3
1.2
11
3DS1
ITFEDLLDYYGP
-78.79
6.4
3.1
12
4J8S
RRLPIFNRISVS
-81.05
1.6
3.3
12
2W10
PPPRPTAPKPLL
25.134
4.2
-0.3
12
3JZO
LTFEHYWAQLTS
-64.5362
1.98
3.9
12
4DGY
QLINTNGSWHIN -84.9614
6.1
2.7
12
2B9H
RRNLKGLNLNLH
5.1
0.76
116.047
Table 2: Results obtained using PLANTS
Number of
PDB
Sequence
Best pose
residues
Best
Best
scoring
RMSD
pose
pose
3
1B9J
KLK
1.8
1.8
1.5
3
2OY2
IAG
0.2
0.9
1.4
3
3GQ1
WLF
2.6
2.7
1.85
3
3BS4
NIF
1.6
4.64
2.2
3
2OXW
IAG
0.9
2.06
1.49
3
2B6N
APT
0.18
2.12
1.4
4
1TW6
AVPI
0.3
1.6
1.4
4
3VQG
VTLV
1.7
3.7
2.2
4
1UOP
GFEP
1.6
2
1.6
4
4C2C
AVPA
0.8
2.5
1.7
4
4J44
AIAV
0.6
2.7
1.4
5
2HPL
DDLYG
2.7
2.7
2.7
5
2V3S
GRFQV
3.59
5.4
2.56
5
3NFK
GETRL
5.8
6.5
4.1
5
1NVR
ASVSA
0.65
2.99
0.5
5
4V3I
DLTRP
2.4
3.13
2.1
5
3T6R
ARTKQ
3.3
4.6
1.6
6
1SVZ
PQFSLW
3.2
5.1
0.7
6
3D1E
GQLGLF
2.5
4.5
2.26
6
3IDG
ALDKWD
2.53
3
1.5
6
3LNY
EQVSAV
2.4
5.3
1.9
6
4NNM
YPTSII
1.8
1.8
3.1
6
4Q6H
VQDTRL
2.5
3.6
3.9
Number of
PDB
Sequence
Best pose
residues
Best
Best
scoring
RMSD
pose
pose
7
3MMG
ETVRFQS
3.7
4.7
2.7
7
3Q47
NPISDVD
2.4
3.9
2.4
7
3UPV
PTVEEVD
2.5
4
1.8
7
4QBR
ARTKQTA
2.2
2.2
3.9
7
3NJG
PQIINRP
3.6
5
4.14
8
1ELW
GPTIEEVD
0.4
5
3.1
8
3CH8
PQPVDSWV
2.77
5.25
2.3
8
4WLB
SLLKKLLD
2.8
5.2
2.1
8
1OU8
GAANDENY
4.2
6.1
3
8
1N7F
ATVRTYSC
3.57
2.9
2.74
9
3OBQ
PTPSAPVPL
2.3
3.3
3
9
4BTB
PPPPPPPPP
0.9
1
1.9
9
2W0Z
APPPRPPKP
4.33
5.8
1.02
9
4N7H
EAPPSYAEYAEV
4.3
4.71
3
9
2QAB
KILHRLLQD
3.48
5.9
4.4
10
1H6W
SLNYIIKVKE
3.3
5.4
3.8
10
3BRL
ATSAKATQTD
3.45
7.9
3.4
10
1NTV
NFDNPVYRKT
3.5
6
4.3
10
4DS1
YAESGIQTDL
2.4
5.5
2.5
10
2O02
GLLDALDLAS
3.6
4.5
4.9
11
1N12
SDVAFRGNLLD
7.5
8.9
3.6
11
2XFX
VGYPKVKEEML
1.9
3.2
4.7
11
3BFW
DSTITIRGYVR
4.4
8.6
3.6
11
4eIk
SLARRPLPPLP
4.4
8.1
3.2
11
3DS1
ITFEDLLDYYGP
5.5
6.4
2.4
12
4J8S
RRLPIFNRISVS
4.9
6.8
1.6
12
2W10
PPPRPTAPKPLL
2.6
5.8
2.9
12
3JZO
LTFEHYWAQLTS
5.3
7
1.4
12
4DGY
QLINTNGSWHIN 5.4
5.6
2.7
12
2B9H
RRNLKGLNLNLH
6.8
4.82
5.58
Table 3, on the other hand, presents the results obtained for the calculation of the
binding affinity according to the docking modelling carried out.
Table 3: Calculation of the docking binding affinity
Number of
PDB
Sequence
residues
Binding
affinity
(kcal/mol)
3
1B9J
KLK
-9.7
3
2OY2
IAG
-5.5
3
3GQ1
WLF
-6.3
3
3BS4
NIF
-9.6
3
2OXW
IAG
-5.1
3
2B6N
APT
0
4
1TW6
AVPI
-4.9
4
3VQG
VTLV
-6.8
4
1UOP
GFEP
-7.4
4
4C2C
AVPA
-7.8
4
4J44
AIAV
-6.5
5
2HPL
DDLYG
-6
5
2V3S
GRFQV
-3.6
5
3NFK
GETRL
-4.4
5
1NVR
ASVSA
-4.4
5
4V3I
DLTRP
-5.5
5
3T6R
ARTKQ
-5.3
6
1SVZ
PQFSLW
-6.5
6
3D1E
GQLGLF
-5.9
6
3IDG
ALDKWD
-5.8
6
3LNY
EQVSAV
-4.8
6
4NNM
YPTSII
-7.8
6
4Q6H
VQDTRL
-5.6
Number of
PDB
Sequence
residues
Binding
affinity
(kcal/mol)
7
3MMG
ETVRFQS
-6.4
7
3Q47
NPISDVD
-6.2
7
3UPV
PTVEEVD
-7.4
7
4QBR
ARTKQTA
-6.4
7
3NJG
PQIINRP
-5.6
8
1ELW
GPTIEEVD
-4.8
8
3CH8
PQPVDSWV
-8.8
8
4WLB
SLLKKLLD
-4.8
8
1OU8
GAANDENY
-5.8
8
1N7F
ATVRTYSC
-4.7
9
3OBQ
PTPSAPVPL
-6.9
9
4BTB
PPPPPPPPP
-6
9
2W0Z
APPPRPPKP
-4.9
9
4N7H
EAPPSYAEYAEV
-6
9
2QAB
KILHRLLQD
-4.6
10
1H6W
SLNYIIKVKE
-3.5
10
3BRL
ATSAKATQTD
-4.3
10
1NTV
NFDNPVYRKT
-5.8
10
4DS1
YAESGIQTDL
-6.5
10
2O02
GLLDALDLAS
-6.6
11
1N12
SDVAFRGNLLD
-5.7
11
2XFX
VGYPKVKEEML
-8.4
11
3BFW
DSTITIRGYVR
-5.4
11
4eIk
SLARRPLPPLP
-6.1
11
3DS1
ITFEDLLDYYGP
0
12
4J8S
RRLPIFNRISVS
-6
12
2W10
PPPRPTAPKPLL
-3.7
12
3JZO
LTFEHYWAQLTS
-4.4
12
4DGY
QLINTNGSWHIN -5.8
12
2B9H
RRNLKGLNLNLH
-3.8
5. Discussion of the results
The objective of the present work is to evaluate the performance of different
modelling methods (VINA and PLANTS) when predicting the docking of small ligands
to the different peptides contained in the LEADS-PEP benchmark database. Such
comparison is carried out by comparing the obtained RMSD values. In this regard, and
as had been mentioned earlier, we can evaluate the values obtained for the root mean
squared deviation (RMSD) to determine the difference between the predicted position
of the different atoms in the peptide and the real one as measured by crystallographic
data.
However, and as had been stated in the introduction section, we need to consider
the length of the peptide being modelled as the previous experiments carried out by
other research groups (Hauser & Windshügel, 2016) had demonstrated that the RMSD
values increased with increasing number of amino acid residues.
It is important to note that a similar effect has been observed in the current case,
as indicated by the comparison of the average RMSD values of the four-residue 4J44
and the eleven-residue 3DS1 peptides. In this regard, figures 1 and 2 present the
calculation of the RMSD for the two selected peptides. As can be observed, the
calculated RMSD is significantly higher for the 3DS1 peptide than for the 4J44 peptide.
Figure 1: RMSD calculation for the four-residue 4J44 peptide
Figure 2: RMSD calculation for the eleven-residue 3DS1 peptide
This difference in the RMSD calculation can easily be understood by comparing
the three-dimensional structures of both peptides, presented in figures 3 and 4,
respectively. In this regard, the higher length of the 3DS1 peptide implies that the model
is significantly more complicated, and therefore less accurate than the one used in the
modelling of the shorter 4J44 peptide.
Figure 3: Structure of the 4J44 peptide
Figure 4: Structure of the 3DS1 peptide
As can be observed from the comparison presented in figure 5, the effect that the
length of the peptide has on the RMSD values is general to all peptides as indicated by
the positive trend observed, according to which the RMSD value slightly increases with
increasing number of residues.
Figure 5: Comparison of the RMSD values obtained for the different peptides
Having said this, it is important to focus on the comparison of how the two
models are able of predicting the three dimensional structure and docking ability of the
different peptides present in the LEADS-PEP database. In this regard, table 4 presents a
comparison of the obtained RMSD values by using the different methods. As can be
observed, the RMSD value obtained by using the VINA method is generally smaller
than the RMSD value obtained for the same peptide by using the PLANTS method.
Hence, the RMSD values were slower in the case of using the VINA modelling method
in 45 out of the 53 peptides analyzed.
Table 4: Comparison of the RMSD values obtained using VINA and PLANTS methods
Number of
PDB
Sequence
residues
RMSD
RMSD
Best
(VINA)
(PLANTS)
method
3
1B9J
KLK
0.3
1.5
VINA
3
2OY2
IAG
-1.2
1.4
VINA
3
3GQ1
WLF
0.75
1.85
VINA
3
3BS4
NIF
-0.6
2.2
VINA
3
2OXW
IAG
-0.59
1.49
VINA
3
2B6N
APT
-1.22
1.4
VINA
4
1TW6
AVPI
-1.1
1.4
VINA
4
3VQG
VTLV
-0.5
2.2
VINA
4
1UOP
GFEP
0
1.6
VINA
4
4C2C
AVPA
-0.9
1.7
VINA
4
4J44
AIAV
-0.8
1.4
VINA
5
2HPL
DDLYG
0
2.7
VINA
5
2V3S
GRFQV
1.03
2.56
VINA
5
3NFK
GETRL
1.7
4.1
VINA
5
1NVR
ASVSA
0.15
0.5
VINA
5
4V3I
DLTRP
0.3
2.1
VINA
5
3T6R
ARTKQ
1.7
1.6
PLANTS
6
1SVZ
PQFSLW
2.5
0.7
PLANTS
6
3D1E
GQLGLF
0.24
2.26
VINA
6
3IDG
ALDKWD
1.03
1.5
VINA
6
3LNY
EQVSAV
0.5
1.9
VINA
6
4NNM
YPTSII
-1.3
3.1
VINA
6
4Q6H
VQDTRL
-1.4
3.9
VINA
Number of
PDB
Sequence
residues
RMSD
RMSD
Best
(VINA)
(PLANTS)
method
7
3MMG
ETVRFQS
1
2.7
VINA
7
3Q47
NPISDVD
0
2.4
VINA
7
3UPV
PTVEEVD
0.7
1.8
VINA
7
4QBR
ARTKQTA
-1.7
3.9
VINA
7
3NJG
PQIINRP
-0.54
4.14
VINA
8
1ELW
GPTIEEVD
-2.7
3.1
VINA
8
3CH8
PQPVDSWV
0.47
2.3
VINA
8
4WLB
SLLKKLLD
0.7
2.1
VINA
8
1OU8
GAANDENY
1.2
3
VINA
8
1N7F
ATVRTYSC
0.83
2.74
VINA
9
3OBQ
PTPSAPVPL
-0.7
3
VINA
9
4BTB
PPPPPPPPP
-1
1.9
VINA
9
2W0Z
APPPRPPKP
3.31
1.02
PLANTS
9
4N7H
EAPPSYAEYAEV
1.3
3
VINA
9
2QAB
KILHRLLQD
-0.92
4.4
VINA
10
1H6W
SLNYIIKVKE
-0.5
3.8
VINA
10
3BRL
ATSAKATQTD
0.05
3.4
VINA
10
1NTV
NFDNPVYRKT
-0.8
4.3
VINA
10
4DS1
YAESGIQTDL
-0.1
2.5
VINA
10
2O02
GLLDALDLAS
-1.3
4.9
VINA
11
1N12
SDVAFRGNLLD
3.9
3.6
PLANTS
11
2XFX
VGYPKVKEEML
-2.8
4.7
VINA
11
3BFW
DSTITIRGYVR
0.8
3.6
VINA
11
4eIk
SLARRPLPPLP
1.2
3.2
VINA
11
3DS1
ITFEDLLDYYGP
3.1
2.4
PLANTS
12
4J8S
RRLPIFNRISVS
3.3
1.6
PLANTS
12
2W10
PPPRPTAPKPLL
-0.3
2.9
VINA
12
3JZO
LTFEHYWAQLTS
3.9
1.4
PLANTS
12
4DGY
QLINTNGSWHIN 2.7
2.7
Both
12
2B9H
RRNLKGLNLNLH
4.82
VINA
0.76
Finally, and considering that the method will be able of accurately describing the
docking process whenever the RMSD value is below 2.5, we can conclude that the
VINA method does not only provide the most accurate results for most of the peptides,
but also provides a satisfactory model for the calculation of the binding affinity during
the docking process of 48 out of the 53 peptides studied. In contrast, the PLANTS
method only provides an accurate model in 28 out of the 53 peptides studied.
This result is once more in agreement with bibliographic results according to
which the VINA modelling software is able of providing satisfactory modelling results
when analyzing the docking of small peptides contained in the LEADS-PEP database as
long as the peptide’s length is kept below 12 amino acid residues. It should be noted,
however, that the validity of such method at predicting the docking behavior of longer
peptides is carefully analyzed considering the dramatic effect that the length of the
peptide has on the complexity of the model and the resulting RMSD value.
6. Conclusions
The main conclusions that can be derived from the study carried out during this
Master Thesis can be summarized as follows:
Despite not having designed for this purpose, the VINA modelling software
can successfully be applied to the modelling of the docking behavior in small
peptides.
The VINA modelling software seems to be much better at predicting the
docking behavior than other modelling software like PLANTS.
The validity of the VINA modelling software at predicting the docking
behavior of longer peptides should carefully be analyzed.
References
FMSH team. (2009). 3D modelling of proteins. Retrieved February 21, 2017, from
http://biologii.net/world/prot4.html
Hauser A.S. & B. Windshügel. LEADS-PEP: A benchmark data set for assessment of
peptide docking performance. Journal of Chemical Information and Modelling
(2016) 56 188-200
Huang, S.H. Search strategies and evaluation in protein-protein docking: principles,
advances and challenges. Drug Discovery Today (2014) 19 1081-1096
Meyer, E.F., S.M. Swanson & J.A. Williams. Molecular modelling and drug design.
Pharmacology & Therapeutics (2000) 85 113-121
Kindly see the attached edited file. It is now 44 pages long, so within your teacher's new limit of 40 pagesI've highlighted either through comments or in yellow the things you still need to focus on to get the final paper. In this sense, there are some data missing from the final results table regarding the solvation and electrostatic energies of a couple of peptides. I guess you forgot to include them in the file, so kindly revise if you have them so that the table is complete :)I'm at your complete disposal in case you need any further assistance regarding this taskBest regards,Carmen
(TITLE)
(NAME)
Master Thesis
(UNIVERSITY)
(DATE)
Table of contents
Acknowledgements ....................................................................................................................... 3
Abstract ......................................................................................................................................... 4
1. Introduction............................................................................................................................... 5
1.1. What is molecular modelling?............................................................................................ 5
1.2. What are the steps involved in molecular modelling for the assessment of the docking
reaction between peptides and proteins? ................................................................................... 7
1.3. Previous research: Development of a benchmark data set for the assessment of peptide
docking by LEADS-PEP ......................................................................................................... 11
2. Objectives ................................................................................................................................ 16
3. Methods .................................................................................................................................. 17
3.1. Materials .......................................................................................................................... 17
3.2. Assessment of docking in the LEADS-PEP benchmark data set by Vina molecular
modelling method ................................................................................................................... 17
3.3. Assessment of docking in the LEADS-PEP benchmark data set by Plants molecular
modelling method ................................................................................................................... 18
4. Results ..................................................................................................................................... 20
5. Discussion of the results.......................................................................................................... 28
6. Conclusions.............................................................................................................................. 43
References................................................................................................................................... 44
Acknowledgements
I would like to thank the University of …. for having provided me with all
the necessary means to carry out the research presented in the present Master
Thesis. Furthermore, I would like to thank my tutor …. for his/her guidance and
support throughout the whole process.
I am especially thankful to my family and friends for their constant support
and encouragement during this year and all my academic background.
Finally, I would like to thank the …. for the financing assistance provided,
since it has enabled me carry out the work described in the current Master Thesis.
Abstract
Two different modelling programs, namely VINA and PLANTS, have been
analyzed for their ability to predict the docking behavior of small peptides contained in
the LEADS-PEP database.
The comparison of the obtained results enabled us to verify previously published
results according to which the VINA software seems to be especially useful at
evaluating the docking of small peptides even if it had not initially been developed for
such purpose. In this regard, the obtained RMSD values clearly indicated that the VINA
modelling software provided accurate results in 45 out of the 53 peptides analyzed in
this study. In contrast, the PLANTS modelling software was able of only describing the
docking behavior of 28 out of the 53 analyzed peptides.
1. Introduction
1.1. What is molecular modelling?
Molecular modelling is the discipline of science that uses computers to build and
develop the structure of chemical molecules by using several models based on quantum
mechanics theories. In this sense, most of these models are based on the solution of the
Schrödinger differential equation presented below:
𝐻𝛹 = 𝐸𝛹
Where H represents the Hamiltonian operator, Ψ is the wavelength function and E is the
energy.
The solution of such equation is relatively simple in the case of the hydrogen
atom, but becomes highly complicated in the case of complex molecules. Taking this
into account, several approximations have been done in order to calculate the different
solutions for complex systems like proteins and small peptides as the one analyzed in
the present thesis.
Molecular modelling enables scientists to predict not only the structure of a
given molecule, but also its reactivity under pre-established environments. This
constitutes a very easy method to validate the results obtained from molecular
modelling. Hence, we can predict a given characteristic such as the acidity of the
molecule, and compare this result with the result obtained from an acid-base titration
experiment. In this regard, it should be noted that since several approximations are used
during the modelling process, only an approximate result will be obtained in most cases.
However, the possibility of such validation of the calculated results represents a
critical point to decide on which is the best calculation method for the exact application.
In this sense, the common practice consists on modelling the chemical structure of the
molecule through different approximation methods and then comparing the results to
evaluate which provides a better description of the experimental observed variable.
Molecular modelling techniques are commonly used in physical chemistry,
inorganic chemistry or biochemistry. They have extensively been applied to predict the
structure and reactivity of small organic molecules, polymers, inorganic solids,
inorganic liquids, liquid crystals and proteins. In the field of biochemistry, molecular
modelling techniques are especially useful at establishing the conformational analysis of
proteins, peptides and enzymes, among others. Hence, they can be used to predict the
sequence of amino acids or the tridimensional structure of the active site.
The main attractiveness of such models is their use in the design of new drugs.
The application of molecular modelling to drug design, however, has only been possible
after the exponential increase in the well-known structures obtained for different
proteins and enzymes by techniques such as nuclear magnetic resonance over the five
decades (Meyer, Swanson & Williams, 2000). In this regard, the inclusion of such
structures into different databases has enabled scientists to access the required
information in a timely manner instead of having to purify the protein and characterize
its structure.
Furthermore, the advances in informatics experienced in the last years have
enabled:
•
The exponential decrease in the required time to perform each calculation
•
The decrease in the computer requirements, meaning that anybody can
nowadays do a simple calculation without needing access to especially
potent computers
1.2. What are the steps involved in molecular modelling for the assessment of the
docking reaction between peptides and proteins?
Basically, the procedure used in molecular modelling for the assessment of the
complexation or docking reaction between peptides and proteins can be summarized as
follows:
1. Identification and characterization of the target proteins: In this stage, it is
important that the target proteins are identified and characterized through
molecular modelling to obtain their three-dimensional structure (Figure 1) and,
most importantly, the sequence of amino acids present in the active site of the
protein and to which the peptide will need to bind.
Figure 1. Sample of a three-dimensional structure of a protein obtained through
molecular modelling (Biologii.net, n.d.)
2. Once that the target proteins have been identified, possible peptides are selected
from the available database according to the identified sequence of amino acids
and structure of the active site in the protein. In this sense, it is important to
consider the chemical reactivity between the different functional groups present
in the amino acids. Hence, we should look for the presence of cysteine amino
acids that might form an intermolecular disulfide bond between the protein and
the peptide or the presence of acid and basic amino acids, as they could react
through an acid-base reaction and form a nearly ionic bond. The formation of
these bonds will significantly increase the stability of the complex, therefore
leading to a higher affinity docking process. Additionally to the chemical
reactivity, we should look for possible complementarity between the structure of
the peptide and the protein (Figure 2).
Figure 2. Dock formed between the protein (green) and the peptide (purple) (Huang,
2014)
3. Once that a set of possible target peptides that can favorably interact with the
protein have been identified, their three-dimensional structure is either searched
in the database (if available) or modelled. In this stage, special attention is paid
to the modelling of the active point of the peptide that will interact with the
protein, as well as any other surrounding amino acids that can pose steric
repulsion and destabilize the protein-peptide complex.
4. When the structures of both the protein and the possible binding peptides are
well characterized, the docking process is modelled. This can be accomplished
by using different tools and software. The objective of these programs is to
model a cocrystallized peptide-protein and the characterization of the binding
force between the peptide and the protein by analyzing the energy changes in the
system when considering the crystallized peptide and protein separately and the
cocrystallized mixture.
5. The data obtained from the modelling by applying the above procedure is
interpreted by the calculation of the root mean square deviation or RMSD. As in
any regression model, the calculation of the RMSD involves the comparison of
the predicted position of the different atoms in the model with the experimental
one by considering the crystallographic and RMN data present in the database.
Taking this into account, the best model would be that providing the lowest
RMSD value, as this would imply that it is able of calculating the exact position
of the different atoms present in the binding site of both the peptide and the
protein.
6. Finally, the obtained results are validated through a different set of experiments.
In this regard, a common practice would be that of selecting the two or three
peptides that had provided the highest interaction with the protein according to
the model, and measuring the binding constant by preparing different mixtures
of peptide and protein and monitoring the change in any physical or chemical
property through the reaction. As an example, if the developed model predicts
that the interaction between the peptide and the protein take place through an
acid base reaction, the change in the pH of the solution could be used to control
the docking reaction from an experimental point of view.
1.3. Previous research: Development of a benchmark data set for the assessment of
peptide docking by LEADS-PEP
Hauser and Windshügel (2016) have recently used molecular modelling to
evaluate the LEADS-PEP benchmark dataset for the assessment of the docking
performance of 53 different protein-peptide complexes. The formation of such
complexes is important from the biological point of view since these complexes
regulate several essential cellular processes. Since they have been estimate to modulate
around the 40% of the cell’s metabolism, their formation or dissociation is currently
being used as the basis of the peptide-based therapeutics for the treatment of e.g. cancer,
hepatitis C or metabolic diseases.
Taking this into account, the work developed by Hauser and Windshügel is a
very good example of how molecular modelling can assist scientists at developing new
drugs that are more effective and have less complications with secondary effects. In this
sense, the interaction between the peptide (designed drug) and the protein in the cell is
similar to that existing between the enzyme and its substrate or between an antibody and
its antigen. This interaction is therefore presumed to be highly specific and very stable.
However, and despite the importance that research on this field would have on
the development of a more systematic drug design protocol, very few experiments have
been carried out so far. In this sense, it should be noted that even while there are
currently several programs able of analyzing peptide docking data as the ones that will
be used in the present work, the comparison of the performance of these programs using
a unique dataset is still missing. Additionally, the fact that the databases are not publicly
available represents one of the most important drawbacks of the application of
molecular modelling to the evaluation of the protein-peptide docking process, as it
limits the comparability of the obtained results by the different modelling methods.
Considering, however, that the molecular modelling has proven a valuable
technique for the modelling of complexation reactions in small organic and inorganic
molecules, scientists expect that similar results should be obtained in the evaluation of
the peptide docking processes.
In this study, Hausser and Windshügel evaluated a total of 53 different peptides
of well-characterized sequences contained in the LEADS-PEP database. This database
has the advantage of being publicly available. The peptides were selected to cover the
full range of acid-base properties as evaluated through the H-bond donor and H-bond
acceptor properties of the different amino acids present in the peptide’s structure. Doing
so enabled the researchers to evaluate the influence of the pH on the stability of the
different docks formed between the peptides and the protein.
The peptide length in this study has been limited to a maximum of 12 amino
acids since according to the experiments carried out the computing time increased
exponentially with the increase in the length of the peptide due to the higher number of
atoms that where introduced in the model (Figure 3). Additionally to the peptide’s
length, the computing time also depended on the exact model being used. This can be
attributed to the number of simplifications being done, such that simpler models tend to
require of less computing time, but generally provide less accurate results.
Figure 3. Effect of peptide length and model on the computing time required to analyze
the structure and stability of the peptide dock to the target protein (Hauser &
Windshügel, 2016)
Another important result found by these authors is that the peptide’s length
significantly affects the accuracy of the obtained results. In this regard, Hauser and
Windschügel found that all the docking programs used in this experiment were able of
accurately modeling the backbone structure of short peptides (with just 3 or 4 residues)
but failed to accurately model longer peptides. Hence, and as can be observed from
figure 4, the RMSD values obtained for the different models increased with the peptide
length:
Figure 4. Average RMSD values obtained for the prediction of the structure of the
position of the backbone of the peptide by the AutoDock method using standard
accuracy. Similar performance was obtained for other modelling methods
According to the results from the study, the modelling method providing the
most accurate results was the Surflex method, that was able of providing an accurate
calculation of the position of 13 different peptide conformations out of the 53 total
peptides analyzed. In this regard, it should be noted, however, that the best modelling
method, considering the RMSD value, depended on the exact sequence of the peptide
being considered, such that the different peptides needed to be modelled by using all
different methods and then evaluate which provided the most accurate result. This
represents a very significant drawback, as the need to use several modelling methods
dramatically increases the required computing time.
It should be noted, however, that the use of the high accuracy mode did not play
a significant role in the RMSD values obtained in comparison with the standard
accuracy mode. Taking this result into account, and considering that the computing time
is greatly increased by selecting the high accuracy mode, the use of such mode does not
seem to be a valuable tool in the description of the docks formed by short peptides.
The main conclusions that can be derived from the work carried out by Hausser
and Windshügel (2016) can be summarized as:
•
The LEADS-PEP database fills in an...