i need a writer to write a thesis for me in chemistry

User Generated

Arfzn123

Science

Description

i want the writer to write a thesis for me i will upload a file that describes everything i want and when i finish all the results i will upload them too. if you have any question please send me an email.

Unformatted Attachment Preview

Article pubs.acs.org/jcim LEADS-PEP: A Benchmark Data Set for Assessment of Peptide Docking Performance Alexander Sebastian Hauser and Björn Windshügel* Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Schnackenburgallee 114, 22525 Hamburg, Germany S Supporting Information * ABSTRACT: With increasing interest in peptide-based therapeutics also the application of computational approaches such as peptide docking has gained more and more attention. In order to assess the suitability of docking programs for peptide placement and to support the development of peptidespecific docking tools, an independently constructed benchmark data set is urgently needed. Here we present the LEADS-PEP benchmark data set for assessing peptide docking performance. Using a rational and unbiased workflow, 53 protein−peptide complexes with peptide lengths ranging from 3 to 12 residues were selected. The data set is publicly accessible at www. leads-x.org. In a second step we evaluated several small molecule docking programs for their potential to reproduce peptide conformations as present in LEADS-PEP. While most tested programs were capable to generate native-like binding modes of small peptides, only Surflex-Dock and AutoDock Vina performed reasonably well for peptides consisting of more than five residues. Rescoring of docking poses with scoring functions ChemPLP, ChemScore, and ASP further increased the number of top-ranked near-native conformations. Our results suggest that small molecule docking programs are a good and fast alternative to specialized peptide docking programs. ■ INTRODUCTION Protein−peptide interactions are involved in numerous cellular processes and are estimated to account for up to 40% of all interactions within the cell.1 Therefore, it is not surprising that in recent years, the development of peptide-based therapeutics has gained increased interest in the pharmaceutical industry and this is expected to further grow in future.2−4 Between 2009 and 2013, 10% of the overall drug approvals were represented by peptides and several of these therapeutics are first-in-class drugs, such as boceprevir and telaprevir, both targeting the hepatitis C virus.5 As of today more than 100 peptide-based drugs have reached the pharmaceutical market and many more are currently investigated in clinical trials.3 Computational chemistry techniques have proven to successfully support the drug discovery process for small molecules, for example by virtual screening.6 The adaption of molecular modeling and docking methods for the prediction of peptide binding modes is currently under intensive development and evaluation.7,8 In particular peptide docking is challenging due to the large number of rotatable bonds and the resulting high flexibility of the molecule. On the other hand, peptides are composed of unique properties such as structural hierarchy and physical restrictions and simplicity that can be employed to improve protein−peptide docking strategies.7 So far, only few programs specifically designed for peptide docking have been developed. An example is Rosetta FlexPepDock. Several protocols are available that revealed good performance in terms of reproducing peptide conformations of different protein−peptide X-ray crystal structures.9,10 Another approach is DynaDock which has been shown to © 2015 American Chemical Society perform well across a data set of 15 protein−peptide complexes.11 In addition to specialized tools also small molecule docking programs have been tested for peptide docking. AutoDock has been shown to dock very short peptides (2−4 aa length) with reasonable accuracy.12 Very recently, a modified version of the docking program Glide performed equally accurate as the Rosetta FlexPepDock ab initio protocol while being over 100 times faster.13 So far, all reports on peptide docking performance suffer from missing comparability of the results as the test sets used for assessment are not publicly accessible. Also it cannot be excluded that these data sets are biased toward a specific tool.7 Therefore, an independently constructed and publicly available benchmark data set of protein−peptide complexes is urgently needed in order to compare available docking programs and to support their further development. For small molecules, several benchmark data sets for docking and virtual screening exist. In order to evaluate the potential of docking programs to reproduce binding modes as determined by X-ray crystallography, the Astex Diverse Set comprising 85 high quality protein−ligand X-ray crystal structures can be utilized.14 For evaluation of virtual screening performance the Directory of Useful Decoys (DUD) is a popular benchmark data set.15,16 An alternative for virtual screening assessment is provided by the Demanding Evaluation Kits for Objective In silico Screening (DEKOIS).17 Received: April 24, 2015 Published: December 14, 2015 188 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling In this study we present LEADS-PEP, the first representative of the Lessons for Efficiency Assessment of Docking and Scoring (LEADS) collection. LEADS-PEP is a publicly available benchmark data set that enables the evaluation of docking programs for their potential to reproduce peptide binding modes and to compare different methods and parameters. The collection consists of 53 protein−peptide complexes that have been prepared using a rational and unbiased workflow. In a second step we have utilized the data set for a detailed evaluation of several popular small molecule docking programs and scoring functions of which most have not been considered for peptide docking so far. AutoDock Vina24 (version 1.1.2) are Open Source software available from the Scripps Research Institute. In addition to GOLD’s default scoring function ChemPLP, all other implemented scoring functions (ASP, ChemScore, and GoldScore) were also investigated. Peptide and Protein Preparation. In order to prevent any bias by using coordinates present in the protein−peptide Xray crystal structures, all peptides were generated in a linear conformation (backbone torsion angles of 180°) within SYBYL-X 2.1.1 with charged termini and minimized utilizing Powell method with default settings. It was ensured that the peptide coordinates of the linearized peptides do not align with coordinates of the peptide binding site. Protein structures were prepared using Protonate3D within MOE. All water atoms were removed. Docking Tools. Surflex-Dock. The protomol file for each complex was built based on all residues within 5 Å of the cocrystallized peptide using a threshold of 0.01 and a bloat of 0. For docking with standard accuracy (SA) the density of search (spindense) was set to 6.0, the number of spins per alignment to 12 (nspin), and the additional starting conformations per molecule (multistart) to 6. For high accuracy (HA) docking the following settings were used: spindense 9.0, nspin 24, multistart 12. The Surflex-Dock “Total_Score” was used as the native scoring function.25 AutoDock. AutoDockTools within MGLTools (version 1.5.6) were utilized in order to generate PDBQT format files of the receptor and ligand. Grids maps were calculated with AutoGrid. The grid box was defined based on the cocrystallized ligand using a python script within MGLTools. Grid dimensions were increased in all six directions by 13 points (4.9 Å). All dockings were performed using the Lamarckian genetic algorithm with the maximum number of energy evaluations set to 2 500 000 (SA) or 25 000 000 (HA). As AutoDock does not handle ligands with more than 32 torsion angles, for larger peptides a recompiled version allowing up to 64 torsion angles was used. AutoDock Vina. Grid dimensions were adopted from the preparation for AutoDock. The exhaustiveness was set to either 8 (SA) or 100 (HA), respectively. GOLD. The docking site was defined by all residues within 5 Å distance to the cocrystallized peptide. For each available scoring function (ChemScore,26 ChemPLP,27 ASP,28 and GoldScore29) a separate docking was performed. The early termination option was switched off. Pose Selection and RMSD Calculation. For all docking scenarios the number of docking runs was set to 20. As a measure of the peptide docking accuracy the root-mean-square deviation (RMSD) for backbone atoms (N, CA, C) was calculated using shell and SPL scripts. In order to evaluate external scoring functions, docking poses were rescored utilizing the rescoring option implemented in GOLD. All four scoring functions (ASP, ChemPLP, ChemScore, GoldScore) were tested with default settings. Only the nonminimized poses were analyzed. Figures with molecular representations were prepared using VMD30 and POV-Ray (www.povray.org). ■ EXPERIMENTAL SECTION Benchmark Data Set Generation. For generation of the LEAPS-PEP data set, a selection process was developed (Figure S1). At first, the Protein Data Bank (PDB)18 was queried for peptide-bound protein X-ray crystal structures with the following constraints: (i) the structure does not contain any DNA or RNA, (ii) it includes experimental data, (iii) the structure contains between two and four chains, (iv) at least one chain is between 2 and 15 amino acids long, (v) the resolution is < 2.0 Å, and (vi) the Rfree < 0.3. The query extracted 1376 PDB entries (as of 29/05/2015) that were downloaded. Each structure was split into its protein and peptide chains. Peptide chains were further filtered for structures that do not include any hetero atoms and are not covalently linked to the protein chain. Complexes containing hetero atoms within 4 Å of the interface between protein and peptide were removed from the set. Subsequently, PROCHECK19 was used to analyze the residue-by-residue geometry and stereochemical quality of the complexes. Structures containing atoms in close distance (30% of the peptide residues have less than three van der Waals contacts to the protein) and/or crystallization artifacts were excluded. For most peptide lengths (3−12 residues) between five and six complexes were chosen for the data set. It was further attempted to include a broad set of peptides with different characteristics such as acidic, basic, hydrophobic, hydrophilic, or aromatic entities. The final peptide docking benchmark data set contains 53 complexes. Docking Programs and Scoring Functions. Within this study, we utilized the docking programs GOLD, Surflex-Dock, AutoDock and AutoDock Vina for the evaluation of their potential to reproduce cocrystallized peptide binding modes. Surflex-Dock21 (version 2.706.13302) is included in SYBYL-X 2.1.1 (Certara L.P., St. Louis, MO, USA). GOLD22 (version 5.2.2) was licensed from Cambridge Crystallographic Data Centre, Cambridge, UK. AutoDock23 (version 4.2.5.1) and ■ RESULTS Benchmark Data Set. We set up a workflow resulting in an unbiased selection of protein−peptide complexes with great structural and functional diversity on the basis of all publicly available X-ray crystal structures. For both proteins and 189 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Table 1. Overview of Peptides Included in the LEADS-PEP Benchmark Data Seta H-bond res PDB sequence heavy atoms rot bonds ring count acc don MW log P 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12 1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H KLK IAG WLF NIF IAG APT AVPI VTLV GFEP AVPA AIAV DDLYG GRFQV GETRL ASVSA DLTRP ARTKQ PQFSLW GQLGLF ALDKWD EQVSAV YPTSII VQDTRL ETVRFQS NPISDVD PTVEEVD ARTKQTA PQIINRP GPTIEEVD PQPVDSWV SLLKKLLD GAANDENY ATVRTYSC PTPSAPVPL PPPPPPPPP APPPRPPKP EAPPSYAEV KILHRLLQD SLNYIIKVKE ATSAKATQTD NFDNPVYRKT YAESGIQTDL GLLDALDLAS SDVAFRGNLLD VGYPKVKEEML DSTITIRGYVR SLARRPLPPLP ITFEDLLDYYGP RRLPIFNRISVS PPPRPTAPKPLL LTFEHYWAQLTS QLINTNGSWHIN RRNLKGLNLNLH 27 18 34 28 18 20 28 30 32 25 26 41 43 40 30 42 42 56 45 53 44 49 51 61 53 55 54 59 60 66 65 60 62 62 64 68 68 80 85 69 89 77 69 85 90 90 86 96 103 91 107 99 102 15 7 11 11 7 6 9 12 11 7 10 18 19 19 14 18 22 21 20 23 21 21 25 30 23 24 28 24 27 24 35 27 31 20 8 19 27 40 44 32 40 38 33 40 44 45 33 44 49 31 48 46 51 0 0 3 1 0 1 1 0 2 1 0 1 1 0 0 1 0 4 1 2 0 2 0 1 1 1 0 2 1 4 0 1 1 4 9 6 3 1 1 0 3 1 0 1 2 1 4 3 2 6 5 3 1 9 7 8 9 7 8 9 10 11 9 9 16 16 17 13 17 18 17 15 19 18 16 21 24 22 22 23 22 24 23 22 26 24 21 19 23 25 29 29 30 33 30 26 33 30 35 30 32 38 30 36 38 40 11 5 6 7 5 5 5 7 5 5 6 8 14 13 9 12 18 11 10 12 11 10 16 18 11 9 21 16 10 12 17 15 19 9 2 13 11 22 22 21 25 17 13 21 19 27 20 16 32 17 22 26 34 390 259 465 392 259 287 399 431 447 356 372 580 607 575 433 601 605 777 634 746 631 693 731 866 757 785 777 838 856 926 930 851 901 878 892 958 960 1136 1207 993 1254 1094 985 1205 1293 1281 1219 1345 1461 1286 1495 1397 1451 0.2 −0.4 3.5 0.1 −0.4 −1.5 1.0 1.0 −0.5 −0.4 0.6 −1.3 −1.5 −2.3 −3.3 −1.9 −4.1 0.9 −0.2 −0.5 −2.9 0.2 −2.8 −3.7 −4.0 −1.9 −5.5 −2.0 −2.7 −1.2 0.7 −6.4 −3.6 −1.2 0.0 −1.8 −2.3 −0.9 −0.4 −8.6 −4.6 −4.1 −1.3 −3.9 0.3 −3.6 −0.6 1.1 −2.5 −0.4 −1.7 −7.2 −5.7 a LEADS-PEP benchmark data set sorted by peptide length (res, number of residues). For each peptide several physicochemical properties (calculated within MOE) are listed (acc, acceptor; don, donor). peptides and/or peptides with missing atoms were discarded. LEADS-PEP includes proteins of not more than 30% sequence identity. For generation of the current release we mainly peptides several quality measures (e.g., stereochemical properties) were considered for the selection. Complexes containing heteroatoms (e.g., buffer molecules) in close distance to the 190 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Figure 1. Overview of CPU time required by tested docking approaches for each peptide. Calculations were performed on an Intel Xeon E5-2620 CPU at 2.00 GHz. Abbreviations: SA, standard accuracy settings; HA, high accuracy settings. settings (5.9 min.). However, it should be noted here that Vina is parallelized and was run using 8 threads in this study. Computing times for peptides using GOLD with scoring functions ChemScore (CS, 8.1 min.) and ASP (8.2 min.) were slightly higher. GoldScore (GS) revealed as slowest GOLD scoring option (27.1 min.). Using SurflexSA the computing time for a peptide was approximately 13 min. With high accuracy (HA) settings the computing time increased to 42.4 min per peptide which is only slightly longer than for AutoDockSA (40.9 min.). AutoDock with high accuracy settings required by far most computing time per peptide (419.2 min.). The standard measure for assessing the accuracy of redocking performance is the root-mean-square deviation (RMSD) between docked pose and the experimentally determined conformation. Here, a docking pose was considered as nearnative conformation once its backbone RMSD is ≤2.5 Å.7 At first, we investigated the RMSD of top-scored docking poses. An overview of the deviation from the experimentally determined peptide coordinates for all programs tested on the LEADS-PEP data set is given in Table 2. Considering the median RMSD over the whole benchmark data set for the top-ranked pose, GOLD with GS scoring function revealed as most accurate docking approach (4.5 Å), closely followed by SurflexSA (4.8 Å), GOLD:CS (4.9 Å), and SurflexHA (5.0 Å). For all other tested docking approaches the median RMSD was above 6 Å. Most programs were capable to reconstruct conformations of shorter peptides (3−4 residues) quite accurately, while longer peptides often caused problems (Table 2). For 4 peptides all docking approaches correctly reproduced the experimentally determined binding modes (1B9J, 1TW6, 4C2C, 4J44), while for 18 others all programs failed to identify a native-like conformation. Using the number of near-native poses as assessment criterion, SurflexSA performed best with 38% of the 53 top-ranked docking poses adopting a near-native conformation. The program not only correctly placed drug-like short peptides (7 out of 11) but also successfully reproduced concentrated on peptides adopting turn or coil conformations. Ten peptides contain secondary structures. The percentage of residues with secondary structure in these peptides ranges between 33 and 82%. More detailed information on the work flow is shown in Figure S1. The outcome of our selection procedure was a data set comprising 53 high-resolution protein−peptide complexes with peptides composed of 3 to 12 residues and having between 7 and 51 rotatable bonds. Table 1 provides an overview on the data set along with some molecular properties of the peptides. Only peptides possessing between 2 and 4 residues revealed drug-like properties as defined by Lipinski’s “Rule-of-Five”,31 which is often used as a probability criterion in drug discovery to estimate oral bioavailability. All peptides possessing more than four residues featured several “Rule-of-Five” violations. In order to ensure a neutral starting structure, all peptides to be docked were generated as extended conformations (φ/ω torsion angles adopting 180°), and it was ensured that the atomic coordinates do not overlap with the binding sites. Evaluation of Small Molecule Docking Programs. In a second step the LEADS-PEP benchmark data set was utilized for a detailed analysis of the peptide docking performance of several popular docking tools, namely AutoDock, AutoDock Vina (hereafter termed Vina), Surflex-Dock (hereafter termed Surflex), and GOLD. None of these programs has been specifically designed for handling peptides or other highly flexible ligands. Settings of the programs were not significantly changed compared to those usually used for small molecule docking. In particular, this included a limited number of docking runs (20). First of all, we analyzed the CPU time required by each program (Figure 1 and Table S1). In general, the computing time increased with residue length and the time difference between shortest and longest peptide reached up to 2 orders of magnitude for the same program. With a median CPU time of 5.6 min, GOLD with ChemPLP (CP) emerged as fastest program, closely followed by Vina using standard accuracy (SA) 191 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling reached 23% success rate. AutoDockSA and both Vina approaches were capable to reproduce 19% of the 53 peptides correctly. GOLD:ASP (15%) and AutoDockHA (17%) showed worst performance. Only VinaHA, Surflex using standard and high accuracy settings as well as GOLD:GS were capable to identify native conformations of peptides containing 10 or more residues. Application of high accuracy settings for AutoDock, Vina, and Surflex did not result in improved overall performance. The median RMSD over the whole benchmark data set using AutoDockHA was almost identical compared to the approach using SA settings and the number of near-native conformations even slightly dropped. For VinaHA and SurflexHA the median RMSD was marginally higher compared to results obtained for standard accuracy settings. The number of near-native poses was identical for both Vina settings but declined by four when using SurflexHA instead of SurflexSA. The number of peptide conformations reproduced correctly by both SA and HA settings was 6 for AutoDock, 8 for Vina, and 13 for Surflex. For 2XFX the application of VinaHA and SurflexHA resulted in near-native conformations of docking poses while both programs failed to produce accurate peptide conformations when using SA settings. In case of 3UPV, 3NJG, and 4DS1, SurflexHA outperformed the same program using standard accuracy settings. For a number of protein−peptide complexes docking programs with standard accuracy settings revealed near-native poses but failed to identify a correct pose among the top-ranked conformations when used with HA settings. Two such incidences occurred when using Vina (2OY2, 3LNY), four in case of AutoDock (2OXW, 2HPL, 3D1E, 4BTB), and even seven when applying Surflex (2HPL, 3NFK, 3IDG, 4NNM, 1OU8, 2W0Z, and 1H6W). Figure 2 shows selected examples of near-native peptide conformations produced by different docking programs. VinaSA docked the largely solvent-exposed 3-mer peptide of 3BS4 correctly and the backbone RMSD compared to the X-ray crystal structure was just 0.6 Å (Figure 2A). Only the Nterminus of the peptide was not correctly placed, resulting in a larger deviation of the asparagine side chain. Although the backbone RMSD of the 2HPL pentapeptide docked with AutoDockSA is reasonably low (1.7 Å), the position of the Nterminal residue was less accurate (RMSD = 4.0) (Figure 2B). Nevertheless the docking pose revealed complete reproduction of the intermolecular hydrogen bond interaction pattern, and only the intramolecular hydrogen bond shared between the aspartate side chain and glycine backbone revealed as shifted toward the tyrosine backbone nitrogen (data not shown). The heptapeptide of 3MMG docked using VinaHA showed different orientations of both terminal residues. Since VinaHA placed four out of five central residues with high accuracy, the overall backbone RMSD was 1.2 Å (Figure 2C). However, it failed to reproduce all hydrogen bonds between protein and peptide. With exception of both terminal residues, SurflexSA positioned the nonamer peptide of 2W0Z very accurately (RMSD = 1.3 Å; Figure 2D). GOLD:GS reproduced the 1H6W peptide binding mode for seven of the ten residues with high accuracy (Figure 2E). Only the N-terminal amino acids revealed larger deviations from the X-ray crystal structure and the backbone RMSD for the whole peptide is 1.1 Å. The conformation for the 2XFX peptide generated by SurflexHA differed only by 1.4 Å from the X-ray crystal structure (Figure 2F). Coordinates of N- and Cterminus matched the crystal structure well but positions of some largely solvent-exposed residues showed larger deviations Table 2. Peptide Docking Performance as Measured by Best Scored Binding Modesa a Backbone RMSD of the best scored poses are shown in a gradient color code. Highly accurate poses (10.0 Å) is highlighted in dark red. Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. near-native conformations of several longer peptides, including also two peptides comprising 11 residues. Only three other approaches reached a success rate of 30% (SurflexHA, GOLD:CS, GOLD:GS). GOLD:CP was on third place and 192 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Figure 2. Selected examples of accurately reproduced peptide binding modes. (A) 3BS4 (peptide length: 3 aa, method VinaSA), (B) 2HPL (5 aa, AutoDockSA), (C) 3MMG (7 aa, VinaHA), (D) 2W0Z (9 aa, SurflexSA), (E) 1H6W (10 aa, GOLD:GoldScore), (F) 2XFX (11 aa, SurflexHA). For 2XFX all side chain atoms except for lysine were removed for clarity. Proteins are shown as surface (carbon atoms in gray), peptides as capped sticks (cocrystallized carbon atoms in orange, docked carbon atoms in green). resulting in a completely different orientation of the lysine side chain. Binding modes of inaccurately docked poses revealed shifting along the backbone, large reorientations of the N- and/or Cterminal region or even completely inverted peptides compared to the X-ray crystal structure conformation (Figure 3, left panel). In a next step we investigated whether other peptide conformations within the set of 20 poses generated for each peptide better agree with the reference structure. Thus, for each docking approach the peptide pose with lowest RMSD to the cocrystallized conformation was extracted. Compared to the top-ranked poses, the median RMSD for the best pose set was significantly lower (Table 3). The drop in RMSD varied between 1.5 and 5.0 Å. Four approaches (VinaHA, SurflexSA, SurflexHA, and GOLD:GS), revealed a median RMSD ≤ 2.5 Å. For all docking programs, the number of near-native poses increased for the best pose set compared to the set containing top-ranked poses (Table 3). SurflexSA performed best and identified 29 near-native poses. Results for VinaHA, SurflexHA and GOLD:GS were almost equally well, resulting in 28 correctly reproduced peptide conformations. VinaSA and GOLD:CS were able to identify 20 near-native poses. GOLD with either ASP or CP placed 17 peptides correctly. Finally, both AutoDock approaches revealed the lowest number of near-native poses (SA 11, HA 12). Furthermore, we investigated the actual number of nearnative conformations within the set of 20 docking poses (Table 4). Despite similar overall performance in terms of median RMSD (best pose) and number of near-native occurrences over the whole benchmark data set, results for VinaHA, GOLD:GS, and Surflex (both settings) largely differed. Application of VinaHA resulted in 101 (9.5% of the set of 1060 poses) and GOLD:GS produced 183 near-native poses (17.3%). SurflexSA and SurflexHA generated 340 (32.1%) and 317 (29.9%) nearnative conformations, respectively. For six peptides both SurflexSA and SurflexHA achieved the maximum number of near-native poses. In case of SurflexSA, this included not only short (3−4 res, 1B9J, 3BS4, 4C2C), but also long peptides (10−11 res, 1H6W, 1N12, 3BFW). SurflexHA generated 20 near-native poses for four short (1B9J, 3BS4, 1TW6, 4C2C), one medium-sized (3MMG), and one long (1N12) peptide. Figure 3. Identification of near-native docking poses using rescoring. (A) 1UOP (peptide length: 4 aa, method: VinaHA and ChemPLP). (B) 1SVZ (6 aa, VinaHA and ChemPLP). (C) 1OU8 (8 aa, SurflexHA and ASP). (D) 4DGY (12 aa, VinaHA and ChemPLP). Proteins are shown as surface (carbon atoms in gray), peptides as capped sticks (cocrystallized carbon atoms in orange, best-scored pose green, best rescored pose cyan). GOLD:GS achieved the maximum number of near-native poses only for drug-like peptides (1B9J, 2OY2, 4C2C, 4J44). For either AutoDock or Vina (both settings) the maximum number 193 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling set of 20 docking poses (Table 4). For example SurflexSA produced up to three near-native conformations for six peptides of which one was also top-ranked (2W0Z). Application of SurflexHA resulted in seven peptides with few near-native docking poses and for two peptides (3NJG, 1ELW) SurflexHA top-ranked these conformations. In more than a third (19) of the 53 peptides VinaHA revealed three or less near-native poses among the 20 peptide conformations. Only in four cases (3BS4, 4C2C, 4NNM, 2XFX) the top-ranked conformation was also a near-native pose. GOLD:GS revealed several peptides (15) with up to 3 near-native conformations with 4 of them topranked by GoldScore (3MMG, 4N7H, 1H6W, 3BRL). Utilization of Rescoring for Improving Peptide Docking Results. The clear discrepancy between the number of top-ranked near-native poses and existing low RMSD conformations indicated a substantial potential for improving the outcome of peptide docking scenarios for several programs. One option is to re-evaluate the docking poses using other scoring functions. The docking program GOLD provides the opportunity to rescore preexisting docking poses using its internal scoring functions. In order to evaluate whether utilization of this functionality may narrow the RMSD gap between top-ranked and best pose set, all docking poses generated during our evaluation were rescored with CP, CS, GS, and ASP, respectively. Table 5 provides a summary of the best rescoring option over the whole data set. Most programs did not show any improvement of the overall median RMSD and when using GOLD with scoring functions ASP and CS the performance even declined. Only for VinaHA (3.5 Å) the median RMSD dropped significantly. However, much more important than the overall RMSD improvement is the capability to actually increase the number of near-native poses within the top-ranked rescored conformations. Table 6 shows the comparison of near-native conformations of topscored, best poses and top-rescored poses. Results for GOLD:CP did not improve when using rescoring and for AutoDockSA the number of top-rescored poses declined compared to the top-ranked poses. For all other approaches the number of near-native poses increased between 6 and 110%. In combination with ASP, CP or CS, VinaHA revealed best overall improvement and all three docking/rescoring combinations resulted in 21 near-native conformations, compared to only 10 considering the top-ranked pose. In combination with CP the additional near-native poses comprised peptides composed of 4 (Figure 3A), 5, 6 (Figure 3B), 7, and 8 as well as 11 and 12 (Figure 3D) residues, respectively. Also the performance of SurflexHA in combination with ASP rescoring improved significantly (38%), resulting in 22 nearnative conformations. Additional peptides with near-native conformations are composed of five, six, eight (Figure 3C) and ten residues, respectively. In case of SurflexSA, identified as the most performant tool considering the best scored set, rescoring increased the number of near-native conformations by 10% to 22 using either CP or CS. All other docking/rescoring approaches always resulted in less than 20 near-native conformations (Table 6). Although rescoring turned out to significantly improve the overall redocking performance for several programs, in few cases the top-rescored poses adopted non-native conformations while the best scored docking pose fulfilled the 2.5 Å criterion. Such unwanted events occurred for example when using VinaHA with CP or CS rescoring (2XFX, top-scored pose: 2.0 Å, top- Table 3. Peptide Docking Performance as Measured by Best Sampled Binding Modesa a Lowest backbone RMSD values from each docking run are shown in a gradient color code. Highly accurate poses (10.0 Å) is highlighted in dark red. Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. of near-native poses per peptide did not exceed 17 (AutoDockHA, 2OY2) or 11 (VinaHA, 4J44), respectively. In several cases VinaHA, SurflexSA, SurflexHA and GOLD:GS generated only few (1−3) near-native conformations within the 194 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Table 4. Number of Near-Native Poses Generated by Each Docking Approacha AutoDock res 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12 sum Vina Surflex GOLD PDB SA HA SA HA SA HA ASP CP CS GS 1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2 V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H 6 8 3 0 5 0 14 0 2 5 10 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57 14 17 13 2 7 0 16 0 4 13 8 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 3 1 6 3 2 1 4 5 0 3 6 2 4 0 0 0 0 0 0 0 3 1 0 0 0 2 2 1 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 54 9 4 10 3 2 0 6 6 2 3 11 3 2 2 1 0 0 3 0 1 2 3 0 4 3 2 6 5 0 2 0 0 0 0 3 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 101 20 2 13 20 1 0 19 10 7 20 11 9 0 10 0 0 0 3 5 17 0 19 0 17 0 8 18 6 12 1 0 16 0 14 0 1 0 0 20 0 0 1 0 20 0 20 0 0 0 0 0 0 0 340 20 0 16 20 0 0 20 19 0 20 4 2 0 9 2 0 0 3 4 0 0 18 10 20 0 5 13 3 1 0 0 12 0 13 0 0 0 0 16 1 0 11 0 20 15 19 1 0 0 0 0 0 0 317 20 20 5 6 9 2 10 1 1 8 14 0 3 1 0 1 1 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 106 20 20 15 11 4 0 14 0 0 10 17 0 6 0 0 2 6 0 1 0 3 1 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 134 20 20 17 14 18 3 8 1 0 13 16 1 4 0 0 5 4 0 1 0 6 2 2 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 158 20 20 15 9 1 3 19 4 5 20 20 0 6 2 0 3 7 0 0 0 1 8 0 1 1 0 7 2 3 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 183 a Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. rescored pose 3.8 Å), SurflexSA with CS (1OU8, 1.7 vs 2.8 Å; 2W0Z, 1.3 vs 4.3 Å), SurflexHA with ASP (4QBR, 1.2 vs 12.3 Å) or GOLD:GS with ASP (3BS4, 0.9 vs 5.4 Å; 4QBR, 1.9 vs 11.5 Å). 195 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling docking. We restricted our analysis to docking/rescoring approaches with at least 33% success rate. The data set for analysis included VinaHA:CP, SurflexSA:CS, SurflexHA:ASP, and GOLD:GS:ASP. At first, we evaluated whether an increasing number of rotatable bonds (see also Table 1) has negative effects on the docking performance (Figure 4A). This was true for all programs, in particular the performance of GOLD:GS:ASP was highly dependent on the number of rotatable bonds. Also we evaluated the impact of the peptide conformation in the bound state on the docking performance. For this purpose the ratios between maximum Cα−Cα distances for bound and linearized peptides were determined. A low ratio indicates a more folded peptide while a high ratio reveals a linear conformation. Within the LEADS-PEP data set 4DGY (bound/linear ratio: 0.13), 4J8S (0.42), and 3DS1 (0.43) possess lowest elongation while several peptides are fully stretched when in complex with their target protein (4C2C, 1.02; 3OBQ, 1.09; 2W0Z, 1.26). Except for VinaHA:CP, we observed a correlation between elongation and RMSD for both Surflex approaches as well as GOLD (Figure 4B). There was also a clear trend for these three approaches when investigating the number of intramolecular hydrogen bonds in the peptides. These are expected to occur in peptides with more condensed conformation. Within LEADSPEP peptides contain between zero and nine such hydrogen bonds. As shown in Figure 4C, an increase in intramolecular hydrogen bonds correlated with a loss in docking accuracy for Surflex and GOLD. Recently, it has been suggested that the presence of free charged side chains strongly correlates with docking success.13 In our data set, the majority of peptides possesses no (36) or a single (14) free charged side chain. Of the four best docking/scoring options, only VinaHA:ASP performance revealed as strongly dependent on the number of free charged side chains (Figure 4D). Table 5. Overview of Best Docking/Rescoring Combinationsa ■ DISCUSSION For determination of binding modes of small molecules at their protein target or for identification of bioactive molecules from a set of active and nonactive compounds current docking programs and scoring functions have reached an acceptable performance.33,34 However, docking of highly flexible peptides still remains a computational challenge and only few specific peptide docking programs have been developed so far. Up to now, no standards regarding an intermethod comparison of docking and scoring performance have been established as has been done for small molecule docking and screening.14−16 In order to evaluate their tools, developers of peptide docking programs often have used self-constructed benchmark data sets, therefore a biased selection cannot be completely excluded and usually the prepared structures are not available for other researchers. Reconstruction of these data sets is error-prone, as the preparation procedure of the proteins (e.g., protonation states, amide and histidine side chain corrections, inclusion of water molecules) may differ. Therefore, we created a benchmark data set for peptide docking with several advantages: (a) LEADS-PEP is publicly available at www. leads-x.org. (b) It is not biased toward any docking program. (c) It is ready-to-use as fully prepared protein and peptide structures are provided. Our collection contains 53 highresolution protein−peptide complexes with peptide lengths ranging from 3 to 12 residues. The selection process was guided by quality and sequential diversity of the structures using an objective and reproducible workflow. Starting from a large set a Backbone RMSD of the best docking/scoring combinations are shown in a gradient color code. Highly accurate poses (10.0 Å) is highlighted in dark red. Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. Molecular Properties Determining Peptide Docking Success. In a last step we intended to investigate molecular properties of the peptides influencing the success of peptide 196 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Table 6. Overview of Near-Native Conformations Obtained with Different Approaches and Best Rescoring Optionsa AutoDock best score best pose rescoring best rescoring options Vina Surflex GOLD SA HA SA HA SA HA ASP CP CS GS 10 11 9 ASP CS 9 12 11 CS 10 20 12 CS CP GS 10 28 21 CP CS ASP 20 29 22 CS CP 16 28 22 ASP 8 17 12 CS 12 17 12 CP 16 20 17 GS 16 28 18 ASP CS a Number of near-native conformations obtained for top-ranked docking pose (best score), pose with lowest RMSD (best pose), and top-rescored pose (rescoring). Best rescoring options are sorted by overall median RMSD. Abbreviations: SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. Figure 4. Evaluation of factors affecting performance for best performing docking/rescoring options. (A) Number of rotatable bonds (VinaHA:CP, Pearson’s correlation coefficient r = 0.31, two-tailored p value = 0.02*); SurflexSA:CS, r = 0.41, p = 0.003**; SurflexHA:ASP, r = 0.37, p = 0.007**; GOLD:GS:ASP, r = 0.51, p = 0.0001***). (B) Ratio between maximum Cα−Cα of bound and linear peptides (VinaHA:CP, r = 0.02, p = 0.89; SurflexSA:CS, r = 0.40, p = 0.003**; SurflexHA:ASP, r = 0.41, p = 0.002**; GOLd:GS:ASP, r = 0.40, p = 0.003**). (C) Number of intramolecular hydrogen bonds (VinaHA:CP, r = 0.10, p = 0.46; SurflexSA:CS, r = 0.48, p = 0.0003***; SurflexHA:ASP, r = 0.54, p =
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Kindly see attached file.

The file is divided into:
-

Title page
Table of contents
Acknowledgement
Abstract
Introduction
Materials and methods
Results
Discussion of results
Conclusions
References


(TITLE)
(NAME)
Master Thesis
(UNIVERSITY)
(DATE)

Table of contents
Acknowledgements ....................................................................................................................... 3
Abstract ......................................................................................................................................... 4
1. Introduction............................................................................................................................... 5
1.1. What is molecular modelling?............................................................................................ 5
1.2. What are the steps involved in molecular modelling for the assessment of the docking
reaction between peptides and proteins? ................................................................................... 7
1.3. Previous research: Development of a benchmark data set for the assessment of peptide
docking by LEADS-PEP ......................................................................................................... 11
2. Objectives ................................................................................................................................ 16
3. Methods .................................................................................................................................. 17
3.1. Materials .......................................................................................................................... 17
3.2. Assessment of docking in the LEADS-PEP benchmark data set by Vina molecular
modelling method ................................................................................................................... 17
3.3. Assessment of docking in the LEADS-PEP benchmark data set by Plants molecular
modelling method ................................................................................................................... 18
4. Results ..................................................................................................................................... 19
5. Discussion of the results.......................................................................................................... 25
6. Conclusions.............................................................................................................................. 32
References................................................................................................................................... 33

Acknowledgements
I would like to thank the University of …. for having
provided me with all the necessary means to carry out the
research presented in the present Master Thesis. Furthermore, I
would like to thank my tutor …. for his/her guidance and support
throughout the whole process.
I am especially thankful to my family and friends for their
constant support and encouragement during this year and all my
academic background.
Finally, I would like to thank the …. for the financing
assistance provided, since it has enabled me carry out the work
described in the current Master Thesis.

Abstract
Two different modelling programs, namely VINA and PLANTS, have been
analyzed for their ability to predict the docking behavior of small peptides contained in
the LEADS-PEP database.
The comparison of the obtained results enabled us to verify previously published
results according to which the VINA software seems to be especially useful at
evaluating the docking of small peptides even if it had not initially been developed for
such purpose. In this regard, the obtained RMSD values clearly indicated that the VINA
modelling software provided accurate results in 45 out of the 53 peptides analyzed in
this study. In contrast, the PLANTS modelling software was able of only describing the
docking behavior of 28 out of the 53 analyzed peptides.

1. Introduction
1.1. What is molecular modelling?

Molecular modelling is the discipline of science that uses computers to build and
develop the structure of chemical molecules by using several models based on quantum
mechanics theories. In this sense, most of these models are based on the solution of the
Schrödinger differential equation presented below:

𝐻𝛹 = 𝐸𝛹

Where H represents the Hamiltonian operator, Ψ is the wavelength function and E is the
energy.

The solution of such equation is relatively simple in the case of the hydrogen
atom, but becomes highly complicated in the case of complex molecules. Taking this
into account, several approximations have been done in order to calculate the different
solutions for complex systems like proteins and small peptides as the one analyzed in
the present thesis.

Molecular modelling enables scientists to predict not only the structure of a
given molecule, but also its reactivity under pre-established environments. This
constitutes a very easy method to validate the results obtained from molecular
modelling. Hence, we can predict a given characteristic such as the acidity of the
molecule, and compare this result with the result obtained from an acid-base titration

experiment. In this regard, it should be noted that since several approximations are used
during the modelling process, only an approximate result will be obtained in most cases.

However, the possibility of such validation of the calculated results represents a
critical point to decide on which is the best calculation method for the exact application.
In this sense, the common practice consists on modelling the chemical structure of the
molecule through different approximation methods and then comparing the results to
evaluate which provides a better description of the experimental observed variable.

Molecular modelling techniques are commonly used in physical chemistry,
inorganic chemistry or biochemistry. They have extensively been applied to predict the
structure and reactivity of small organic molecules, polymers, inorganic solids,
inorganic liquids, liquid crystals and proteins. In the field of biochemistry, molecular
modelling techniques are especially useful at establishing the conformational analysis of
proteins, peptides and enzymes, among others. Hence, they can be used to predict the
sequence of amino acids or the tridimensional structure of the active site.

The main attractiveness of such models is their use in the design of new drugs.
The application of molecular modelling to drug design, however, has only been possible
after the exponential increase in the well-known structures obtained for different
proteins and enzymes by techniques such as nuclear magnetic resonance over the five
decades (Meyer, Swanson & Williams, 2000). In this regard, the inclusion of such
structures into different databases has enabled scientists to access the required
information in a timely manner instead of having to purify the protein and characterize
its structure.

Furthermore, the advances in informatics experienced in the last years have
enabled:



The exponential decrease in the required time to perform each calculation



The decrease in the computer requirements, meaning that anybody can
nowadays do a simple calculation without needing access to especially
potent computers

1.2. What are the steps involved in molecular modelling for the assessment of the
docking reaction between peptides and proteins?

Basically, the procedure used in molecular modelling for the assessment of the
complexation or docking reaction between peptides and proteins can be summarized as
follows:

1. Identification and characterization of the target proteins: In this stage, it is
important that the target proteins are identified and characterized through
molecular modelling to obtain their three-dimensional structure (Figure 1) and,
most importantly, the sequence of amino acids present in the active site of the
protein and to which the peptide will need to bind.

Figure 1. Sample of a three-dimensional structure of a protein obtained through
molecular modelling (Biologii.net, n.d.)

2. Once that the target proteins have been identified, possible peptides are selected
from the available database according to the identified sequence of amino acids
and structure of the active site in the protein. In this sense, it is important to
consider the chemical reactivity between the different functional groups present
in the amino acids. Hence, we should look for the presence of cysteine amino
acids that might form an intermolecular disulfide bond between the protein and
the peptide or the presence of acid and basic amino acids, as they could react
through an acid-base reaction and form a nearly ionic bond. The formation of
these bonds will significantly increase the stability of the complex, therefore
leading to a higher affinity docking process. Additionally to the chemical
reactivity, we should look for possible complementarity between the structure of
the peptide and the protein (Figure 2).

Figure 2. Dock formed between the protein (green) and the peptide (purple) (Huang,
2014)

3. Once that a set of possible target peptides that can favorably interact with the
protein have been identified, their three-dimensional structure is either searched
in the database (if available) or modelled. In this stage, special attention is paid
to the modelling of the active point of the peptide that will interact with the
protein, as well as any other surrounding amino acids that can pose steric
repulsion and destabilize the protein-peptide complex.
4. When the structures of both the protein and the possible binding peptides are
well characterized, the docking process is modelled. This can be accomplished
by using different tools and software. The objective of these programs is to
model a cocrystallized peptide-protein and the characterization of the binding
force between the peptide and the protein by analyzing the energy changes in the

system when considering the crystallized peptide and protein separately and the
cocrystallized mixture.
5. The data obtained from the modelling by applying the above procedure is
interpreted by the calculation of the root mean square deviation or RMSD. As in
any regression model, the calculation of the RMSD involves the comparison of
the predicted position of the different atoms in the model with the experimental
one by considering the crystallographic and RMN data present in the database.
Taking this into account, the best model would be that providing the lowest
RMSD value, as this would imply that it is able of calculating the exact position
of the different atoms present in the binding site of both the peptide and the
protein.
6. Finally, the obtained results are validated through a different set of experiments.
In this regard, a common practice would be that of selecting the two or three
peptides that had provided the highest interaction with the protein according to
the model, and measuring the binding constant by preparing different mixtures
of peptide and protein and monitoring the change in any physical or chemical
property through the reaction. As an example, if the developed model predicts
that the interaction between the peptide and the protein take place through an
acid base reaction, the change in the pH of the solution could be used to control
the docking reaction from an experimental point of view.

1.3. Previous research: Development of a benchmark data set for the assessment of
peptide docking by LEADS-PEP

Hauser and Windshügel (2016) have recently used molecular modelling to
evaluate the LEADS-PEP benchmark dataset for the assessment of the docking
performance of 53 different protein-peptide complexes. The formation of such
complexes is important from the biological point of view since these complexes
regulate several essential cellular processes. Since they have been estimate to modulate
around the 40% of the cell’s metabolism, their formation or dissociation is currently
being used as the basis of the peptide-based therapeutics for the treatment of e.g. cancer,
hepatitis C or metabolic diseases.

Taking this into account, the work developed by Hauser and Windshügel is a
very good example of how molecular modelling can assist scientists at developing new
drugs that are more effective and have less complications with secondary effects. In this
sense, the interaction between the peptide (designed drug) and the protein in the cell is
similar to that existing between the enzyme and its substrate or between an antibody and
its antigen. This interaction is therefore presumed to be highly specific and very stable.

However, and despite the importance that research on this field would have on
the development of a more systematic drug design protocol, very few experiments have
been carried out so far. In this sense, it should be noted that even while there are
currently several programs able of analyzing peptide docking data as the ones that will
be used in the present work, the comparison of the performance of these programs using
a unique dataset is still missing. Additionally, the fact that the databases are not publicly

available represents one of the most important drawbacks of the application of
molecular modelling to the evaluation of the protein-peptide docking process, as it
limits the comparability of the obtained results by the different modelling methods.

Considering, however, that the molecular modelling has proven a valuable
technique for the modelling of complexation reactions in small organic and inorganic
molecules, scientists expect that similar results should be obtained in the evaluation of
the peptide docking processes.

In this study, Hausser and Windshügel evaluated a total of 53 different peptides
of well-characterized sequences contained in the LEADS-PEP database. This database
has the advantage of being publicly available. The peptides were selected to cover the
full range of acid-base properties as evaluated through the H-bond donor and H-bond
acceptor properties of the different amino acids present in the peptide’s structure. Doing
so enabled the researchers to evaluate the influence of the pH on the stability of the
different docks formed between the peptides and the protein.

The peptide length in this study has been limited to a maximum of 12 amino
acids since according to the experiments carried out the computing time increased
exponentially with the increase in the length of the peptide due to the higher number of
atoms that where introduced in the model (Figure 3). Additionally to the peptide’s
length, the computing time also depended on the exact model being used. This can be
attributed to the number of simplifications being done, such that simpler models tend to
require of less computing time, but generally provide less accurate results.

Figure 3. Effect of peptide length and model on the computing time required to analyze
the structure and stability of the peptide dock to the target protein (Hauser &
Windshügel, 2016)

Another important result found by these authors is that the peptide’s length
significantly affects the accuracy of the obtained results. In this regard, Hauser and
Windschügel found that all the docking programs used in this experiment were able of
accurately modeling the backbone structure of short peptides (with just 3 or 4 residues)
but failed to accurately model longer peptides. Hence, and as can be observed from
figure 4, the RMSD values obtained for the different models increased with the peptide
length:

Figure 4. Average RMSD values obtained for the prediction of the structure of the
position of the backbone of the peptide by the AutoDock method using standard
accuracy. Similar performance was obtained for other modelling methods

According to the results from the study, the modelling method providing the
most accurate results was the Surflex method, that was able of providing an accurate
calculation of the position of 13 different peptide conformations out of the 53 total
peptides analyzed. In this regard, it should be noted, however, that the best modelling
method, considering the RMSD value, depended on the exact sequence of the peptide
being considered, such that the different peptides needed to be modelled by using all
different methods and then evaluate which provided the most accurate result. This
represents a very significant drawback, as the need to use several modelling methods
dramatically increases the required computing time.

It should be noted, however, that the use of the high accuracy mode did not play
a significant role in the RMSD values obtained in comparison with the standard
accuracy mode. Taking this result into account, and considering that the computing time
is greatly increased by selecting the high accuracy mode, the use of such mode does not
seem to be a valuable tool in the description of the docks formed by short peptides.

The main conclusions that can be derived from the work carried out by Hausser
and Windshügel (2016) can be summarized as:



The LEADS-PEP database fills in an existing gap in the collection of data
for assessing the performance of docking molecular modelling software.



The structure of peptide-protein docks contained in the LEADS-PEP
database has been characterized by using different modelling methods



Even while these methods were not originally designed for the docking
analysis of peptides, they seem to provide accurate results, most especially
for small peptides (of just 3 or 4 amino acid residues)



The use of the high accuracy mode using any of these modelling methods
does not provide more accurate results and significantly increases the
computing time. Taking this into account, the standard accuracy mode is
selected as optimal for the modelling of the selected peptide-protein docks.



Each particular dock should be modelled by all the available methods before
being able to decide on which provides the most accurate results. Taking this
into account, further research is necessary in this field to validate the
obtained results.

2. Objectives
Taking this into account, and using the work carried out by Hausser and
Windshügel (2016) as a starting point, the main objective of the work carried out during
my Master project have been that of testing the affinity between peptides and proteins
by molecular modelling. To do so, I have focused on:


Evaluation of which of the programs (VINA or PLANTS) provides the most
accurate results when assessing the docking between peptides and proteins



Comparison of the results obtained by each of the programs, and compare
these results with the previously published work developed by Hausser and
Windshügel



Trying to understand how the peptide is docked to the protein by using each
molecular modelling software and calculating the respective RMSD values.

3. Methods
(You would need to include here the exact settings used in the Plants method. You hadn’t
uploaded the protocol to be able of writing it. Additionally, you should include information on
how you have measured the different values presented in the tables in the results section.
Example, which method did you use to estimate the binding affinity? This is especially
important considering that all your binding affinities are lower than 0, which is nonsense from
a chemical point of view so you will most probably be asked about it)

3.1. Materials
The material used in these experiments consists on:


Vega ZZ computer program



Protein databases: RCSB PDB and LEADS-PEP



Vina modelling software



Plants modelling software

3.2. Assessment of docking in the LEADS-PEP benchmark data set by Vina
molecular modelling method
The method used in the LEAD-PEP analysis of the docking of proteins and
peptides database using VINA software can be summarized in the following steps:
1. Select a protein and/or peptide from the LEAD-PEP database
2. Open the selected protein using the Vega ZZ software
3. Remove the water molecules from the system
4. Run the receptor docking using VINA with the target protein or peptide
5. Open the selected inhibitor molecule and run the ligand docking using VINA
6. Run VINA docking with the receptor (protein) and ligand (inhibitor) using
XYZ.
The exact scripts to follow these instructions using the Vega ZZ software are:



To open the peptide, click on File > Run script > Docking > Vina > Ligand >
Run



To open the receptor, click on File > Run script > Docking > Vina >
Receptor.c > Run



To open a reference pep, go to the center of the peptide with the pointer of
the mouse, right click to obtain the center’s coordinates (xyz), click on the
center and choose chain - change – atom. A table will appear with the xyz
coordinates of the alpha carbon (designed as CA) of the center of the target
peptide.

Once that the xyz coordinates of the protein, ligand and reference peptide have
been obtained, the docking process is started by using the xyz coordinates obtained for
the reference peptide. The docking process has be done on the different peptides
contained in the LEADS-PEP database, selecting the best nine peptides that have a
higher affinity for the receptor and are closest pose to the reference peptide. The
obtained files are then saved using the “.mol2” extension.
The files obtained for the docking results using the selected nine peptides are
then merged using the Vega ZZ software and the RMSD is calculated for the different
files to determine which is the best model. As a rule of thumb, a good model will
present an RMSD value of at most 2.50A, being better for lower RMSD values.

3.3. Assessment of docking in the LEADS-PEP benchmark data set by Plants
molecular modelling method
(Kindly upload this information)

4. Results
Tables 1 and 2 present the results obtained from the modelling of the docking of
the different peptides using both VINA and PLANTS software.
Table 1: Results obtained using VINA
Number of

PDB

Sequence

residues

Best

Best

Best

ranking

scoring

RMSD

total score

pose

pose

3

1B9J

KLK

-140.148

2.04

0.3

3

2OY2

IAG

-42.373

0.8

-1.2

3

3GQ1

WLF

-70.58

1.9

0.75

3

3BS4

NIF

-111.904

2.6

-0.6

3

2OXW

IAG

-44.521

2.1

-0.59

3

2B6N

APT

-62.9332

4.76

-1.22

4

1TW6

AVPI

-46.159

1.8

-1.1

4

3VQG

VTLV

-88.37

3.9

-0.5

4

1UOP

GFEP

-39.97

3.2

0

4

4C2C

AVPA

-95.62

3.4

-0.9

4

4J44

AIAV

-80.02

2.9

-0.8

5

2HPL

DDLYG

-89.158

3.6

0

5

2V3S

GRFQV

-46.383

3.7

1.03

5

3NFK

GETRL

255.526

5.5

1.7

5

1NVR

ASVSA

-30.793

3.2

0.15

5

4V3I

DLTRP

-73.38

4.6

0.3

5

3T6R

ARTKQ

-112.35

1.6

1.7

6

1SVZ

PQFSLW

-43.61

3.3

2.5

6

3D1E

GQLGLF

-9.09

2.4

0.24

6

3IDG

ALDKWD

-49.53

4

1.03

6

3LNY

EQVSAV

-36.54

5

0.5

6

4NNM

YPTSII

-86.93

3.7

-1.3

6

4Q6H

VQDTRL

-85.6

4.7

-1.4

Number of

PDB

Sequence

residues

Best

Best

Best

ranking

scoring

RMSD

total score

pose

pose

7

3MMG

ETVRFQS

32.77

5.4

1

7

3Q47

NPISDVD

-91.19

3.5

0

7

3UPV

PTVEEVD

-96.5

4.7

0.7

7

4QBR

ARTKQTA

-86.61

4.7

-1.7

7

3NJG

PQIINRP

46.44

5.5

-0.54

8

1ELW

GPTIEEVD

-79.169

4.17

-2.7

8

3CH8

PQPVDSWV

-83.2148

5.6

0.47

8

4WLB

SLLKKLLD

-101.04

4.6

0.7

8

1OU8

GAANDENY

-98.4172

4.78

1.2

8

1N7F

ATVRTYSC

12.78

4.9

0.83

9

3OBQ

PTPSAPVPL

-85.1

3.8

-0.7

9

4BTB

PPPPPPPPP

-63.13

4.4

-1

9

2W0Z

APPPRPPKP

-21.159

1.3

3.31

9

4N7H

EAPPSYAEYAEV

37.92

2.27

1.3

9

2QAB

KILHRLLQD

824.587

4.5

-0.92

10

1H6W

SLNYIIKVKE

563

4.4

-0.5

10

3BRL

ATSAKATQTD

32.547

5.6

0.05

10

1NTV

NFDNPVYRKT

268.071

4.3

-0.8

10

4DS1

YAESGIQTDL

-78.7908

6.3

-0.1

10

2O02

GLLDALDLAS

794.143

5.9

-1.3

11

1N12

SDVAFRGNLLD

417.073

5.6

3.9

11

2XFX

VGYPKVKEEML

105.37

5.7

-2.8

11

3BFW

DSTITIRGYVR

417.077

5.6

0.8

11

4eIk

SLARRPLPPLP

168.116

3.3

1.2

11

3DS1

ITFEDLLDYYGP

-78.79

6.4

3.1

12

4J8S

RRLPIFNRISVS

-81.05

1.6

3.3

12

2W10

PPPRPTAPKPLL

25.134

4.2

-0.3

12

3JZO

LTFEHYWAQLTS

-64.5362

1.98

3.9

12

4DGY

QLINTNGSWHIN -84.9614

6.1

2.7

12

2B9H

RRNLKGLNLNLH

5.1

0.76

116.047

Table 2: Results obtained using PLANTS
Number of

PDB

Sequence

Best pose

residues

Best

Best

scoring

RMSD

pose

pose

3

1B9J

KLK

1.8

1.8

1.5

3

2OY2

IAG

0.2

0.9

1.4

3

3GQ1

WLF

2.6

2.7

1.85

3

3BS4

NIF

1.6

4.64

2.2

3

2OXW

IAG

0.9

2.06

1.49

3

2B6N

APT

0.18

2.12

1.4

4

1TW6

AVPI

0.3

1.6

1.4

4

3VQG

VTLV

1.7

3.7

2.2

4

1UOP

GFEP

1.6

2

1.6

4

4C2C

AVPA

0.8

2.5

1.7

4

4J44

AIAV

0.6

2.7

1.4

5

2HPL

DDLYG

2.7

2.7

2.7

5

2V3S

GRFQV

3.59

5.4

2.56

5

3NFK

GETRL

5.8

6.5

4.1

5

1NVR

ASVSA

0.65

2.99

0.5

5

4V3I

DLTRP

2.4

3.13

2.1

5

3T6R

ARTKQ

3.3

4.6

1.6

6

1SVZ

PQFSLW

3.2

5.1

0.7

6

3D1E

GQLGLF

2.5

4.5

2.26

6

3IDG

ALDKWD

2.53

3

1.5

6

3LNY

EQVSAV

2.4

5.3

1.9

6

4NNM

YPTSII

1.8

1.8

3.1

6

4Q6H

VQDTRL

2.5

3.6

3.9

Number of

PDB

Sequence

Best pose

residues

Best

Best

scoring

RMSD

pose

pose

7

3MMG

ETVRFQS

3.7

4.7

2.7

7

3Q47

NPISDVD

2.4

3.9

2.4

7

3UPV

PTVEEVD

2.5

4

1.8

7

4QBR

ARTKQTA

2.2

2.2

3.9

7

3NJG

PQIINRP

3.6

5

4.14

8

1ELW

GPTIEEVD

0.4

5

3.1

8

3CH8

PQPVDSWV

2.77

5.25

2.3

8

4WLB

SLLKKLLD

2.8

5.2

2.1

8

1OU8

GAANDENY

4.2

6.1

3

8

1N7F

ATVRTYSC

3.57

2.9

2.74

9

3OBQ

PTPSAPVPL

2.3

3.3

3

9

4BTB

PPPPPPPPP

0.9

1

1.9

9

2W0Z

APPPRPPKP

4.33

5.8

1.02

9

4N7H

EAPPSYAEYAEV

4.3

4.71

3

9

2QAB

KILHRLLQD

3.48

5.9

4.4

10

1H6W

SLNYIIKVKE

3.3

5.4

3.8

10

3BRL

ATSAKATQTD

3.45

7.9

3.4

10

1NTV

NFDNPVYRKT

3.5

6

4.3

10

4DS1

YAESGIQTDL

2.4

5.5

2.5

10

2O02

GLLDALDLAS

3.6

4.5

4.9

11

1N12

SDVAFRGNLLD

7.5

8.9

3.6

11

2XFX

VGYPKVKEEML

1.9

3.2

4.7

11

3BFW

DSTITIRGYVR

4.4

8.6

3.6

11

4eIk

SLARRPLPPLP

4.4

8.1

3.2

11

3DS1

ITFEDLLDYYGP

5.5

6.4

2.4

12

4J8S

RRLPIFNRISVS

4.9

6.8

1.6

12

2W10

PPPRPTAPKPLL

2.6

5.8

2.9

12

3JZO

LTFEHYWAQLTS

5.3

7

1.4

12

4DGY

QLINTNGSWHIN 5.4

5.6

2.7

12

2B9H

RRNLKGLNLNLH

6.8

4.82

5.58

Table 3, on the other hand, presents the results obtained for the calculation of the
binding affinity according to the docking modelling carried out.
Table 3: Calculation of the docking binding affinity
Number of

PDB

Sequence

residues

Binding
affinity
(kcal/mol)

3

1B9J

KLK

-9.7

3

2OY2

IAG

-5.5

3

3GQ1

WLF

-6.3

3

3BS4

NIF

-9.6

3

2OXW

IAG

-5.1

3

2B6N

APT

0

4

1TW6

AVPI

-4.9

4

3VQG

VTLV

-6.8

4

1UOP

GFEP

-7.4

4

4C2C

AVPA

-7.8

4

4J44

AIAV

-6.5

5

2HPL

DDLYG

-6

5

2V3S

GRFQV

-3.6

5

3NFK

GETRL

-4.4

5

1NVR

ASVSA

-4.4

5

4V3I

DLTRP

-5.5

5

3T6R

ARTKQ

-5.3

6

1SVZ

PQFSLW

-6.5

6

3D1E

GQLGLF

-5.9

6

3IDG

ALDKWD

-5.8

6

3LNY

EQVSAV

-4.8

6

4NNM

YPTSII

-7.8

6

4Q6H

VQDTRL

-5.6

Number of

PDB

Sequence

residues

Binding
affinity
(kcal/mol)

7

3MMG

ETVRFQS

-6.4

7

3Q47

NPISDVD

-6.2

7

3UPV

PTVEEVD

-7.4

7

4QBR

ARTKQTA

-6.4

7

3NJG

PQIINRP

-5.6

8

1ELW

GPTIEEVD

-4.8

8

3CH8

PQPVDSWV

-8.8

8

4WLB

SLLKKLLD

-4.8

8

1OU8

GAANDENY

-5.8

8

1N7F

ATVRTYSC

-4.7

9

3OBQ

PTPSAPVPL

-6.9

9

4BTB

PPPPPPPPP

-6

9

2W0Z

APPPRPPKP

-4.9

9

4N7H

EAPPSYAEYAEV

-6

9

2QAB

KILHRLLQD

-4.6

10

1H6W

SLNYIIKVKE

-3.5

10

3BRL

ATSAKATQTD

-4.3

10

1NTV

NFDNPVYRKT

-5.8

10

4DS1

YAESGIQTDL

-6.5

10

2O02

GLLDALDLAS

-6.6

11

1N12

SDVAFRGNLLD

-5.7

11

2XFX

VGYPKVKEEML

-8.4

11

3BFW

DSTITIRGYVR

-5.4

11

4eIk

SLARRPLPPLP

-6.1

11

3DS1

ITFEDLLDYYGP

0

12

4J8S

RRLPIFNRISVS

-6

12

2W10

PPPRPTAPKPLL

-3.7

12

3JZO

LTFEHYWAQLTS

-4.4

12

4DGY

QLINTNGSWHIN -5.8

12

2B9H

RRNLKGLNLNLH

-3.8

5. Discussion of the results
The objective of the present work is to evaluate the performance of different
modelling methods (VINA and PLANTS) when predicting the docking of small ligands
to the different peptides contained in the LEADS-PEP benchmark database. Such
comparison is carried out by comparing the obtained RMSD values. In this regard, and
as had been mentioned earlier, we can evaluate the values obtained for the root mean
squared deviation (RMSD) to determine the difference between the predicted position
of the different atoms in the peptide and the real one as measured by crystallographic
data.
However, and as had been stated in the introduction section, we need to consider
the length of the peptide being modelled as the previous experiments carried out by
other research groups (Hauser & Windshügel, 2016) had demonstrated that the RMSD
values increased with increasing number of amino acid residues.
It is important to note that a similar effect has been observed in the current case,
as indicated by the comparison of the average RMSD values of the four-residue 4J44
and the eleven-residue 3DS1 peptides. In this regard, figures 1 and 2 present the
calculation of the RMSD for the two selected peptides. As can be observed, the
calculated RMSD is significantly higher for the 3DS1 peptide than for the 4J44 peptide.

Figure 1: RMSD calculation for the four-residue 4J44 peptide

Figure 2: RMSD calculation for the eleven-residue 3DS1 peptide

This difference in the RMSD calculation can easily be understood by comparing
the three-dimensional structures of both peptides, presented in figures 3 and 4,
respectively. In this regard, the higher length of the 3DS1 peptide implies that the model
is significantly more complicated, and therefore less accurate than the one used in the
modelling of the shorter 4J44 peptide.

Figure 3: Structure of the 4J44 peptide

Figure 4: Structure of the 3DS1 peptide

As can be observed from the comparison presented in figure 5, the effect that the
length of the peptide has on the RMSD values is general to all peptides as indicated by
the positive trend observed, according to which the RMSD value slightly increases with
increasing number of residues.

Figure 5: Comparison of the RMSD values obtained for the different peptides

Having said this, it is important to focus on the comparison of how the two
models are able of predicting the three dimensional structure and docking ability of the
different peptides present in the LEADS-PEP database. In this regard, table 4 presents a
comparison of the obtained RMSD values by using the different methods. As can be
observed, the RMSD value obtained by using the VINA method is generally smaller
than the RMSD value obtained for the same peptide by using the PLANTS method.
Hence, the RMSD values were slower in the case of using the VINA modelling method
in 45 out of the 53 peptides analyzed.

Table 4: Comparison of the RMSD values obtained using VINA and PLANTS methods
Number of

PDB

Sequence

residues

RMSD

RMSD

Best

(VINA)

(PLANTS)

method

3

1B9J

KLK

0.3

1.5

VINA

3

2OY2

IAG

-1.2

1.4

VINA

3

3GQ1

WLF

0.75

1.85

VINA

3

3BS4

NIF

-0.6

2.2

VINA

3

2OXW

IAG

-0.59

1.49

VINA

3

2B6N

APT

-1.22

1.4

VINA

4

1TW6

AVPI

-1.1

1.4

VINA

4

3VQG

VTLV

-0.5

2.2

VINA

4

1UOP

GFEP

0

1.6

VINA

4

4C2C

AVPA

-0.9

1.7

VINA

4

4J44

AIAV

-0.8

1.4

VINA

5

2HPL

DDLYG

0

2.7

VINA

5

2V3S

GRFQV

1.03

2.56

VINA

5

3NFK

GETRL

1.7

4.1

VINA

5

1NVR

ASVSA

0.15

0.5

VINA

5

4V3I

DLTRP

0.3

2.1

VINA

5

3T6R

ARTKQ

1.7

1.6

PLANTS

6

1SVZ

PQFSLW

2.5

0.7

PLANTS

6

3D1E

GQLGLF

0.24

2.26

VINA

6

3IDG

ALDKWD

1.03

1.5

VINA

6

3LNY

EQVSAV

0.5

1.9

VINA

6

4NNM

YPTSII

-1.3

3.1

VINA

6

4Q6H

VQDTRL

-1.4

3.9

VINA

Number of

PDB

Sequence

residues

RMSD

RMSD

Best

(VINA)

(PLANTS)

method

7

3MMG

ETVRFQS

1

2.7

VINA

7

3Q47

NPISDVD

0

2.4

VINA

7

3UPV

PTVEEVD

0.7

1.8

VINA

7

4QBR

ARTKQTA

-1.7

3.9

VINA

7

3NJG

PQIINRP

-0.54

4.14

VINA

8

1ELW

GPTIEEVD

-2.7

3.1

VINA

8

3CH8

PQPVDSWV

0.47

2.3

VINA

8

4WLB

SLLKKLLD

0.7

2.1

VINA

8

1OU8

GAANDENY

1.2

3

VINA

8

1N7F

ATVRTYSC

0.83

2.74

VINA

9

3OBQ

PTPSAPVPL

-0.7

3

VINA

9

4BTB

PPPPPPPPP

-1

1.9

VINA

9

2W0Z

APPPRPPKP

3.31

1.02

PLANTS

9

4N7H

EAPPSYAEYAEV

1.3

3

VINA

9

2QAB

KILHRLLQD

-0.92

4.4

VINA

10

1H6W

SLNYIIKVKE

-0.5

3.8

VINA

10

3BRL

ATSAKATQTD

0.05

3.4

VINA

10

1NTV

NFDNPVYRKT

-0.8

4.3

VINA

10

4DS1

YAESGIQTDL

-0.1

2.5

VINA

10

2O02

GLLDALDLAS

-1.3

4.9

VINA

11

1N12

SDVAFRGNLLD

3.9

3.6

PLANTS

11

2XFX

VGYPKVKEEML

-2.8

4.7

VINA

11

3BFW

DSTITIRGYVR

0.8

3.6

VINA

11

4eIk

SLARRPLPPLP

1.2

3.2

VINA

11

3DS1

ITFEDLLDYYGP

3.1

2.4

PLANTS

12

4J8S

RRLPIFNRISVS

3.3

1.6

PLANTS

12

2W10

PPPRPTAPKPLL

-0.3

2.9

VINA

12

3JZO

LTFEHYWAQLTS

3.9

1.4

PLANTS

12

4DGY

QLINTNGSWHIN 2.7

2.7

Both

12

2B9H

RRNLKGLNLNLH

4.82

VINA

0.76

Finally, and considering that the method will be able of accurately describing the
docking process whenever the RMSD value is below 2.5, we can conclude that the
VINA method does not only provide the most accurate results for most of the peptides,
but also provides a satisfactory model for the calculation of the binding affinity during
the docking process of 48 out of the 53 peptides studied. In contrast, the PLANTS
method only provides an accurate model in 28 out of the 53 peptides studied.
This result is once more in agreement with bibliographic results according to
which the VINA modelling software is able of providing satisfactory modelling results
when analyzing the docking of small peptides contained in the LEADS-PEP database as
long as the peptide’s length is kept below 12 amino acid residues. It should be noted,
however, that the validity of such method at predicting the docking behavior of longer
peptides is carefully analyzed considering the dramatic effect that the length of the
peptide has on the complexity of the model and the resulting RMSD value.

6. Conclusions
The main conclusions that can be derived from the study carried out during this
Master Thesis can be summarized as follows:


Despite not having designed for this purpose, the VINA modelling software
can successfully be applied to the modelling of the docking behavior in small
peptides.



The VINA modelling software seems to be much better at predicting the
docking behavior than other modelling software like PLANTS.



The validity of the VINA modelling software at predicting the docking
behavior of longer peptides should carefully be analyzed.

References
FMSH team. (2009). 3D modelling of proteins. Retrieved February 21, 2017, from
http://biologii.net/world/prot4.html
Hauser A.S. & B. Windshügel. LEADS-PEP: A benchmark data set for assessment of
peptide docking performance. Journal of Chemical Information and Modelling
(2016) 56 188-200
Huang, S.H. Search strategies and evaluation in protein-protein docking: principles,
advances and challenges. Drug Discovery Today (2014) 19 1081-1096
Meyer, E.F., S.M. Swanson & J.A. Williams. Molecular modelling and drug design.
Pharmacology & Therapeutics (2000) 85 113-121

Kindly see the attached edited file. It is now 44 pages long, so within your teacher's new limit of 40 pagesI've highlighted either through comments or in yellow the things you still need to focus on to get the final paper. In this sense, there are some data missing from the final results table regarding the solvation and electrostatic energies of a couple of peptides. I guess you forgot to include them in the file, so kindly revise if you have them so that the table is complete :)I'm at your complete disposal in case you need any further assistance regarding this taskBest regards,Carmen

(TITLE)
(NAME)
Master Thesis
(UNIVERSITY)
(DATE)

Table of contents
Acknowledgements ....................................................................................................................... 3
Abstract ......................................................................................................................................... 4
1. Introduction............................................................................................................................... 5
1.1. What is molecular modelling?............................................................................................ 5
1.2. What are the steps involved in molecular modelling for the assessment of the docking
reaction between peptides and proteins? ................................................................................... 7
1.3. Previous research: Development of a benchmark data set for the assessment of peptide
docking by LEADS-PEP ......................................................................................................... 11
2. Objectives ................................................................................................................................ 16
3. Methods .................................................................................................................................. 17
3.1. Materials .......................................................................................................................... 17
3.2. Assessment of docking in the LEADS-PEP benchmark data set by Vina molecular
modelling method ................................................................................................................... 17
3.3. Assessment of docking in the LEADS-PEP benchmark data set by Plants molecular
modelling method ................................................................................................................... 18
4. Results ..................................................................................................................................... 20
5. Discussion of the results.......................................................................................................... 28
6. Conclusions.............................................................................................................................. 43
References................................................................................................................................... 44

Acknowledgements
I would like to thank the University of …. for having provided me with all
the necessary means to carry out the research presented in the present Master
Thesis. Furthermore, I would like to thank my tutor …. for his/her guidance and
support throughout the whole process.
I am especially thankful to my family and friends for their constant support
and encouragement during this year and all my academic background.
Finally, I would like to thank the …. for the financing assistance provided,
since it has enabled me carry out the work described in the current Master Thesis.

Abstract
Two different modelling programs, namely VINA and PLANTS, have been
analyzed for their ability to predict the docking behavior of small peptides contained in
the LEADS-PEP database.
The comparison of the obtained results enabled us to verify previously published
results according to which the VINA software seems to be especially useful at
evaluating the docking of small peptides even if it had not initially been developed for
such purpose. In this regard, the obtained RMSD values clearly indicated that the VINA
modelling software provided accurate results in 45 out of the 53 peptides analyzed in
this study. In contrast, the PLANTS modelling software was able of only describing the
docking behavior of 28 out of the 53 analyzed peptides.

1. Introduction
1.1. What is molecular modelling?

Molecular modelling is the discipline of science that uses computers to build and
develop the structure of chemical molecules by using several models based on quantum
mechanics theories. In this sense, most of these models are based on the solution of the
Schrödinger differential equation presented below:

𝐻𝛹 = 𝐸𝛹

Where H represents the Hamiltonian operator, Ψ is the wavelength function and E is the
energy.

The solution of such equation is relatively simple in the case of the hydrogen
atom, but becomes highly complicated in the case of complex molecules. Taking this
into account, several approximations have been done in order to calculate the different
solutions for complex systems like proteins and small peptides as the one analyzed in
the present thesis.

Molecular modelling enables scientists to predict not only the structure of a
given molecule, but also its reactivity under pre-established environments. This
constitutes a very easy method to validate the results obtained from molecular
modelling. Hence, we can predict a given characteristic such as the acidity of the
molecule, and compare this result with the result obtained from an acid-base titration

experiment. In this regard, it should be noted that since several approximations are used
during the modelling process, only an approximate result will be obtained in most cases.

However, the possibility of such validation of the calculated results represents a
critical point to decide on which is the best calculation method for the exact application.
In this sense, the common practice consists on modelling the chemical structure of the
molecule through different approximation methods and then comparing the results to
evaluate which provides a better description of the experimental observed variable.

Molecular modelling techniques are commonly used in physical chemistry,
inorganic chemistry or biochemistry. They have extensively been applied to predict the
structure and reactivity of small organic molecules, polymers, inorganic solids,
inorganic liquids, liquid crystals and proteins. In the field of biochemistry, molecular
modelling techniques are especially useful at establishing the conformational analysis of
proteins, peptides and enzymes, among others. Hence, they can be used to predict the
sequence of amino acids or the tridimensional structure of the active site.

The main attractiveness of such models is their use in the design of new drugs.
The application of molecular modelling to drug design, however, has only been possible
after the exponential increase in the well-known structures obtained for different
proteins and enzymes by techniques such as nuclear magnetic resonance over the five
decades (Meyer, Swanson & Williams, 2000). In this regard, the inclusion of such
structures into different databases has enabled scientists to access the required
information in a timely manner instead of having to purify the protein and characterize
its structure.

Furthermore, the advances in informatics experienced in the last years have
enabled:



The exponential decrease in the required time to perform each calculation



The decrease in the computer requirements, meaning that anybody can
nowadays do a simple calculation without needing access to especially
potent computers

1.2. What are the steps involved in molecular modelling for the assessment of the
docking reaction between peptides and proteins?

Basically, the procedure used in molecular modelling for the assessment of the
complexation or docking reaction between peptides and proteins can be summarized as
follows:

1. Identification and characterization of the target proteins: In this stage, it is
important that the target proteins are identified and characterized through
molecular modelling to obtain their three-dimensional structure (Figure 1) and,
most importantly, the sequence of amino acids present in the active site of the
protein and to which the peptide will need to bind.

Figure 1. Sample of a three-dimensional structure of a protein obtained through
molecular modelling (Biologii.net, n.d.)

2. Once that the target proteins have been identified, possible peptides are selected
from the available database according to the identified sequence of amino acids
and structure of the active site in the protein. In this sense, it is important to
consider the chemical reactivity between the different functional groups present
in the amino acids. Hence, we should look for the presence of cysteine amino
acids that might form an intermolecular disulfide bond between the protein and
the peptide or the presence of acid and basic amino acids, as they could react
through an acid-base reaction and form a nearly ionic bond. The formation of
these bonds will significantly increase the stability of the complex, therefore
leading to a higher affinity docking process. Additionally to the chemical
reactivity, we should look for possible complementarity between the structure of
the peptide and the protein (Figure 2).

Figure 2. Dock formed between the protein (green) and the peptide (purple) (Huang,
2014)

3. Once that a set of possible target peptides that can favorably interact with the
protein have been identified, their three-dimensional structure is either searched
in the database (if available) or modelled. In this stage, special attention is paid
to the modelling of the active point of the peptide that will interact with the
protein, as well as any other surrounding amino acids that can pose steric
repulsion and destabilize the protein-peptide complex.
4. When the structures of both the protein and the possible binding peptides are
well characterized, the docking process is modelled. This can be accomplished
by using different tools and software. The objective of these programs is to
model a cocrystallized peptide-protein and the characterization of the binding
force between the peptide and the protein by analyzing the energy changes in the

system when considering the crystallized peptide and protein separately and the
cocrystallized mixture.
5. The data obtained from the modelling by applying the above procedure is
interpreted by the calculation of the root mean square deviation or RMSD. As in
any regression model, the calculation of the RMSD involves the comparison of
the predicted position of the different atoms in the model with the experimental
one by considering the crystallographic and RMN data present in the database.
Taking this into account, the best model would be that providing the lowest
RMSD value, as this would imply that it is able of calculating the exact position
of the different atoms present in the binding site of both the peptide and the
protein.
6. Finally, the obtained results are validated through a different set of experiments.
In this regard, a common practice would be that of selecting the two or three
peptides that had provided the highest interaction with the protein according to
the model, and measuring the binding constant by preparing different mixtures
of peptide and protein and monitoring the change in any physical or chemical
property through the reaction. As an example, if the developed model predicts
that the interaction between the peptide and the protein take place through an
acid base reaction, the change in the pH of the solution could be used to control
the docking reaction from an experimental point of view.

1.3. Previous research: Development of a benchmark data set for the assessment of
peptide docking by LEADS-PEP

Hauser and Windshügel (2016) have recently used molecular modelling to
evaluate the LEADS-PEP benchmark dataset for the assessment of the docking
performance of 53 different protein-peptide complexes. The formation of such
complexes is important from the biological point of view since these complexes
regulate several essential cellular processes. Since they have been estimate to modulate
around the 40% of the cell’s metabolism, their formation or dissociation is currently
being used as the basis of the peptide-based therapeutics for the treatment of e.g. cancer,
hepatitis C or metabolic diseases.

Taking this into account, the work developed by Hauser and Windshügel is a
very good example of how molecular modelling can assist scientists at developing new
drugs that are more effective and have less complications with secondary effects. In this
sense, the interaction between the peptide (designed drug) and the protein in the cell is
similar to that existing between the enzyme and its substrate or between an antibody and
its antigen. This interaction is therefore presumed to be highly specific and very stable.

However, and despite the importance that research on this field would have on
the development of a more systematic drug design protocol, very few experiments have
been carried out so far. In this sense, it should be noted that even while there are
currently several programs able of analyzing peptide docking data as the ones that will
be used in the present work, the comparison of the performance of these programs using
a unique dataset is still missing. Additionally, the fact that the databases are not publicly

available represents one of the most important drawbacks of the application of
molecular modelling to the evaluation of the protein-peptide docking process, as it
limits the comparability of the obtained results by the different modelling methods.

Considering, however, that the molecular modelling has proven a valuable
technique for the modelling of complexation reactions in small organic and inorganic
molecules, scientists expect that similar results should be obtained in the evaluation of
the peptide docking processes.

In this study, Hausser and Windshügel evaluated a total of 53 different peptides
of well-characterized sequences contained in the LEADS-PEP database. This database
has the advantage of being publicly available. The peptides were selected to cover the
full range of acid-base properties as evaluated through the H-bond donor and H-bond
acceptor properties of the different amino acids present in the peptide’s structure. Doing
so enabled the researchers to evaluate the influence of the pH on the stability of the
different docks formed between the peptides and the protein.

The peptide length in this study has been limited to a maximum of 12 amino
acids since according to the experiments carried out the computing time increased
exponentially with the increase in the length of the peptide due to the higher number of
atoms that where introduced in the model (Figure 3). Additionally to the peptide’s
length, the computing time also depended on the exact model being used. This can be
attributed to the number of simplifications being done, such that simpler models tend to
require of less computing time, but generally provide less accurate results.

Figure 3. Effect of peptide length and model on the computing time required to analyze
the structure and stability of the peptide dock to the target protein (Hauser &
Windshügel, 2016)

Another important result found by these authors is that the peptide’s length
significantly affects the accuracy of the obtained results. In this regard, Hauser and
Windschügel found that all the docking programs used in this experiment were able of
accurately modeling the backbone structure of short peptides (with just 3 or 4 residues)
but failed to accurately model longer peptides. Hence, and as can be observed from
figure 4, the RMSD values obtained for the different models increased with the peptide
length:

Figure 4. Average RMSD values obtained for the prediction of the structure of the
position of the backbone of the peptide by the AutoDock method using standard
accuracy. Similar performance was obtained for other modelling methods

According to the results from the study, the modelling method providing the
most accurate results was the Surflex method, that was able of providing an accurate
calculation of the position of 13 different peptide conformations out of the 53 total
peptides analyzed. In this regard, it should be noted, however, that the best modelling
method, considering the RMSD value, depended on the exact sequence of the peptide
being considered, such that the different peptides needed to be modelled by using all
different methods and then evaluate which provided the most accurate result. This
represents a very significant drawback, as the need to use several modelling methods
dramatically increases the required computing time.

It should be noted, however, that the use of the high accuracy mode did not play
a significant role in the RMSD values obtained in comparison with the standard
accuracy mode. Taking this result into account, and considering that the computing time
is greatly increased by selecting the high accuracy mode, the use of such mode does not
seem to be a valuable tool in the description of the docks formed by short peptides.

The main conclusions that can be derived from the work carried out by Hausser
and Windshügel (2016) can be summarized as:



The LEADS-PEP database fills in an...


Anonymous
Great study resource, helped me a lot.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags