Programming Lab 6 Report

User Generated

Nyfnu123

Programming

Description

Unformatted Attachment Preview

hoxannot pfam 00046_Hom eobox_Hom eobox_dom ain._ pfam 16326_ABC_t ran_CTD_ABC_t ransport er_C-t erm inal_dom ain._This_dom ain_is_found_at _t he_C-t erm inus_of_ABC_t ransport ers._It _has_a_coiled_coil_st ruct ur pfam 08317_Spc7_Spc7_kinet ochore_prot ein._This_dom ain_is_found_in_cell_division_prot eins_which_are_required_for_kinet ochore-spindle_associat ion. pfam 05109_Herpes_BLLF1_Herpes_virus_m ajor_out er_envelope_glycoprot ein_(BLLF1)._This_fam ily_consist s_of_t he_BLLF1_viral_lat e_glycoprot ein_also_t erm e pfam 05278_PEARLI-4_Arabidopsis_phospholipase-like_prot ein_(PEARLI_4)._This_fam ily_cont ains_several_phospholipase-like_prot eins_from _Arabidopsis_t halia pfam 04617_Hox9_act _Hox9_act ivat ion_region._This_fam ily_const it ut es_t he_N_t erm ini_of_t he_paralogous_hom eobox_prot eins_HoxA9_HoxB9_HoxC9_and_Hox pfam 10525_Engrail_1_C_sig_Engrailed_hom eobox_C-t erm inal_signat ure_dom ain._Engrailed_hom eobox_prot eins_are_charact erized_by_t he_presence_of_a_con pfam 04218_CENP-B_N_CENP-B_N-t erm inal_DNA-binding_dom ain._Cent rom ere_Prot ein_B_(CENP-B)_is_a_DNA-binding_prot ein_localized_t o_t he_cent rom ere._W pfam 04731_Caudal_act _Caudal_like_prot ein_act ivat ion_region._This_fam ily_consist s_of_t he_am ino_t erm ini_of_prot eins_belonging_t o_t he_caudal-relat ed_hom pfam 13293_DUF4074_Dom ain_of_unknown_funct ion_(DUF4074)._This_fam ily_is_found_at _t he_C-t erm inal_of_Hom eobox_prot eins_in_Met azoa. pfam 12045_DUF3528_Prot ein_of_unknown_funct ion_(DUF3528)._This_fam ily_of_prot eins_is_funct ionally_uncharact erized._This_prot ein_is_found_in_eukaryot e pfam 05920_Hom eobox_KN_Hom eobox_KN_dom ain._This_is_a_hom eobox_t ranscript ion_fact or_KN_dom ain_conserved_from _fungi_t o_hum an_and_plant s._They pfam 17380_DUF5401_Fam ily_of_unknown_funct ion_(DUF5401)._This_is_a_fam ily_of_unknown_funct ion_found_in_Chrom adorea. pfam 06583_Neogenin_C_Neogenin_C-t erm inus._This_fam ily_represent s_t he_C-t erm inus_of_eukaryot ic_neogenin_precursor_prot eins_which_cont ains_several_ Nvect ensis.LOC5518689 hom eobox prot ein engrailedli Nvect ensis.LOC116611509 hom eobox prot ein MSX2like Bbelcheri.LOC109479268 hom eobox prot ein ceh30like Bbelcheri.LOC109479269 hom eobox prot ein vab15like Dgigant ea.LOC114520258 hom eobox prot ein Hm xlike Dgigant ea.LOC114520241 hom eobox prot ein pnxlike Bbelcheri.LOC109479273 hom eobox prot ein engrailed1 Nvect ensis.LOC116608936 hom eobox prot ein HMX3Alike Myessoensis.NW 018407737.1 LOC110456657 Bbelcheri.LOC109462984 barHlike 2 hom eobox prot ein Aplanci.LOC110973961 hom eobox prot ein Nkx61like Bbelcheri.LOC109486788 hom eobox prot ein vent 1like Bbelcheri.LOC109486789 hom eobox prot ein vent 1like Bbelcheri.LOC109462947 Tcell leukem ia hom eobox pro Aplanci.LOC110973894 Tcell leukem ia hom eobox prot e Hsapiens.NP 0055121 Tcell leukem ia hom eobox prot ei Hsapiens.NP 0663052 Tcell leukem ia hom eobox prot ei Myessoensis.NW 018408625.1 LOC110461278 Hsapiens.NP 0036492 hom eobox prot ein BarHlike 2 Hsapiens.NP 0644481 barHlike 1 hom eobox prot ein Bbelcheri.LOC109464825 Tcell leukem ia hom eobox pro Nvect ensis.LOC5514051 hom eobox prot ein HoxC5 isofo Dgigant ea.LOC114524538 hom eobox prot ein HoxA7like Dgigant ea.LOC114544209 hom eobox prot ein HoxA7like Hvulgaris.LOC100197809 hom eobox prot ein HoxD10like Bbelcheri.LOC109475868 hom eobox prot ein HoxA1like Hsapiens.NP 0021352 hom eobox prot ein HoxB1 Hsapiens.NP 0055132 hom eobox prot ein HoxA1 isoform Hsapiens.NP 0787771 hom eobox prot ein HoxD1 Aplanci.LOC110974647 hom eobox prot ein HoxA1alike Myessoensis.NW 018485984.1 LOC110445101 Nvect ensis.LOC5514380 hom eobox prot ein unplugged Bbelcheri.LOC109462344 hom eobox prot ein GBX2like Myessoensis.NW 018406762.1 LOC110452732 Hsapiens.NP 0010923041 hom eobox prot ein GBX1 Hsapiens.NP 0014762 hom eobox prot ein GBX2 isoform Aplanci.LOC110977331 hom eobox prot ein unpluggedlik Nvect ensis.LOC5514103 hom eobox prot ein MSHC Aplanci.LOC110976046 hom eobox prot ein roughlike Dgigant ea.LOC114519119 hom eobox prot ein roughlike Nvect ensis.LOC5505085 hom eobox prot ein MOX2 Nvect ensis.LOC5505088 hom eobox prot ein HoxB4 isofo Dgigant ea.LOC114527332 hom eobox prot ein MOX2like Nvect ensis.LOC5505086 hom eobox prot ein MOX2 isofor Nvect ensis.LOC5505087 hom eobox prot ein MOX2 Hvulgaris.LOC100214554 hom eobox prot ein pnxlike Bbelcheri.LOC109475891 hom eobox prot ein MOX1like Aplanci.LOC110988557 hom eobox prot ein MOX1like Myessoensis.NW 018404831.1 LOC110446822 Myessoensis.NW 018404831.1 LOC110446812 Myessoensis.NW 018484896.1 LOC110444607 Hsapiens.NP 0045181 hom eobox prot ein MOX1 isoform Hsapiens.NP 0059152 hom eobox prot ein MOX2 Nvect ensis.LOC5514056 hom eobox prot ein HoxB4a Bbelcheri.LOC109481563 m ot or neuron and pancreas h Bbelcheri.LOC109462363 m ot or neuron and pancreas h Myessoensis.NW 018480059.1 LOC110443067 Aplanci.LOC110975976 m ot or neuron and pancreas hom Hsapiens.NP 0055063 m ot or neuron and pancreas hom e Nvect ensis.LOC5521839 hom eobox prot ein HoxB7A Dgigant ea.LOC114532759 hom eobox prot ein HoxA9like Dgigant ea.LOC114524627 hom eobox prot ein HoxB5blike Dgigant ea.LOC114544210 hom eobox prot ein HoxB5blike Nvect ensis.LOC5521423 hom eobox prot ein HoxC4 Dgigant ea.LOC114516149 pancreasduodenum hom eobox p Hvulgaris.LOC105850935 hom eobox prot ein m ab5like Hvulgaris.LOC100201084 hom eobox prot ein m ab5 Hvulgaris.LOC100213602 hom eobox prot ein HoxC6 Nvect ensis.LOC5517717 hom eobox prot ein HoxB3a Bbelcheri.LOC109470583 pancreasduodenum hom eobox p Myessoensis.NW 018406672.1 LOC110452400 Hsapiens.NP 0002001 pancreasduodenum hom eobox prot Aplanci.LOC110988620 pancreasduodenum hom eobox pro Myessoensis.NW 018406672.1 LOC110452409 Hsapiens.NP 0012564 hom eobox prot ein CDX2 isoform Hsapiens.NP 0017952 hom eobox prot ein CDX1 Bbelcheri.LOC109475825 hom eobox prot ein HoxB4alike Aplanci.LOC110974660 hom eobox prot ein HoxB4like Hsapiens.NP 0021323 hom eobox prot ein HoxA4 Hsapiens.NP 7058971 hom eobox prot ein HoxC4 Hsapiens.NP 0769201 hom eobox prot ein HoxB4 Hsapiens.NP 0554362 hom eobox prot ein HoxD4 Myessoensis.NW 018485984.1 LOC110445094 Bbelcheri.LOC109475859 hom eobox prot ein HoxB5like Hsapiens.NP 0619752 hom eobox prot ein HoxA5 Hsapiens.NP 0021381 hom eobox prot ein HoxB5 Hsapiens.NP 0618261 hom eobox prot ein HoxC5 Aplanci.LOC110974633 hom eobox prot ein HoxA5like Myessoensis.NW 018485984.1 LOC110445095 Bbelcheri.LOC109475861 hom eobox prot ein HoxB6like Myessoensis.NW 018485984.1 LOC110445080 Bbelcheri.LOC109475860 hom eobox prot ein HoxA7like Bbelcheri.LOC109475781 hom eobox prot ein HoxD8like Myessoensis.NW 018485984.1 LOC110445078 Myessoensis.NW 018485984.1 LOC110445099 Myessoensis.NW 018485984.1 LOC110445103 Hsapiens.NP 0769191 hom eobox prot ein HoxA6 Hsapiens.NP 0618252 hom eobox prot ein HoxB6 Hsapiens.NP 0044941 hom eobox prot ein HoxC6 isoform Aplanci.LOC110974644 hom eobox prot ein HB1like Hsapiens.NP 0731491 hom eobox prot ein HoxC8 Hsapiens.NP 0624581 hom eobox prot ein HoxD8 isoform Hsapiens.NP 0769211 hom eobox prot ein HoxB8 Hsapiens.NP 0044933 hom eobox prot ein HoxB7 Hsapiens.NP 0088272 hom eobox prot ein HoxA7 Aplanci.LOC110974629 hom eobox prot ein HoxA7like is Bbelcheri.LOC109475889 hom eobox prot ein HoxC9alike Bbelcheri.LOC109475772 hom eobox prot ein HoxA10like Bbelcheri.LOC109475776 hom eobox prot ein HoxD8like Bbelcheri.LOC109475780 hom eobox prot ein HoxA9like Hsapiens.NP 0550271 hom eobox prot ein HoxC11 Hsapiens.NP 0055141 hom eobox prot ein HoxA11 Hsapiens.NP 0670152 hom eobox prot ein HoxD11 Hsapiens.NP 0618243 hom eobox prot ein HoxA10 Hsapiens.NP 0021392 hom eobox prot ein HoxD10 Hsapiens.NP 0591052 hom eobox prot ein HoxC10 Aplanci.LOC110974479 hom eobox prot ein HoxC9alike Hsapiens.NP 6899521 hom eobox prot ein HoxA9 Hsapiens.NP 0550283 hom eobox prot ein HoxD9 Hsapiens.NP 0769221 hom eobox prot ein HoxB9 Aplanci.LOC110974642 hom eobox prot ein HoxA10like Aplanci.LOC110974385 hom eobox prot ein HoxC8like Aplanci.LOC110973135 hom eobox prot ein HoxB7Alike i Nvect ensis.LOC116601518 hom eobox prot ein HoxD3like Dgigant ea.LOC114532667 hom eobox prot ein HoxB3like Bbelcheri.LOC109475823 hom eobox prot ein HOX3 Hsapiens.NP 0013716781 hom eobox prot ein HoxB3 isof Hsapiens.NP 7058951 hom eobox prot ein HoxA3 Hsapiens.NP 0088293 hom eobox prot ein HoxD3 Myessoensis.NW 018485984.1 LOC110445065 Aplanci.LOC110974557 hom eobox prot ein HoxA3alike Nvect ensis.LOC5514093 hom eobox prot ein HoxA5 Nvect ensis.LOC5514048 hom eobox prot ein HoxA6 Nvect ensis.LOC5514094 hom eobox prot ein HoxB6 Dgigant ea.LOC114520820 hom eobox prot ein HoxA4like Bbelcheri.LOC109475826 hom eobox prot ein HoxB2like Aplanci.LOC110974556 hom eobox prot ein HoxC4like Hsapiens.NP 0067261 hom eobox prot ein HoxA2 Hsapiens.NP 0021361 hom eobox prot ein HoxB2 Myessoensis.NW 018485984.1 LOC110445098 Nvect ensis.LOC5521840 hom eobox prot ein abdom inalA Nvect ensis.LOC5517734 GS hom eobox 1 Dgigant ea.LOC114528570 hom eobox prot ein HoxC6like Hvulgaris.LOC100215022 hom eobox prot ein HoxC6alike Bbelcheri.LOC109470584 uncharact erized prot ein LOC Hsapiens.NP 5735742 GS hom eobox 2 Hsapiens.NP 6636321 GS hom eobox 1 Aplanci.LOC110988652 GS hom eobox 1like Myessoensis.NW 018406672.1 LOC110452406 What information should be in your results? To get answers: Identify homologs à align homologs à produce gene tree à reconcile gene tree and species tree How has domain content changed with gene family evolution? What figures should be in your results? To get answers: Identify homologs à align homologs à produce gene tree à reconcile gene tree and species tree How has domain content changed with gene family evolution? Getting read for the methods: - Conceptual map - Project Repository To get answers: Identify homologs à align homologs à produce gene tree à reconcile gene tree and species tree How has domain content changed with gene family evolution? Analysis Pipeline Song et al. 2015 Cylinder shapes indicate data, arrows data flows, rectangular shapes programs (process, name). In Class Activity (8 points) • Draw the pipeline that you used to identify and analyze your gene (Labs 5-8) • Data at each step (cylinders) • Programs (Process + name) • Arrows (data flow) • Use https://sketch.io/sketchpad/, for example. - Project Repository • You will be given a ”Project Repository” • This is your project’s pipeline. • For extra credit: – All of the commands present – All of the data files and result files present – Someone could just clone, run the commands with the provided data, and get the resulting output! – Nothing extra!! – (Right now, this is spread out among labs 3-6). Results: Describe the outcomes of your analyses. Supplement your description with figures and tables, and refer to these in your text. • Your results section should have: • Clear and complete narrative description, point by point, of each result. Walk the reader through the results, what they mean, and how they are to be interpreted. • Accurate and scientifically sound interpretation of your results using appropriate technical terms. • • • At least two figures and/or tables that appropriately illuminate the results. Relevant figures or tables from your analysis are placed either on separate pages after the main text or within the text. Figures are numbered consecutively, and each figure is accompanied with a legend. Figure are referred to directly in the main text. Axes are labeled with units. Tables should be numbered consecutively, referred to directly in the main text, and each table should include a title. Example: “Our results strongly suggest that Gremlin production increases as an exponential function of the number of times Mogwai are fed after midnight (Figure 1).” A legend is written for each figure. A legend is a complete description of the figure and can stand alone from the text. Just provide the main results that shed light onto your research question. Not every detail. Keep focused on your research question as described in your Introduction (and adjust your question, if necessary). Results: (for starters, to be expanded). - How many homologs? Average percent identity. Length of the alignment. - Figures: - Rooted Gene tree, with domain content Rooted Gene tree, with bootstrap support Gene tree: notung reconciliation Gene tree: notung reconciliation displayed as rechplyovisu Motifs and Domains To get answers: Identify homologs à align homologs à produce gene tree à reconcile gene tree and species tree How has domain content changed with gene family evolution? Examples of protein domains This protein (hemocyanin) has two distinct domains (blue and green) which are connected by a short linker (red). This enzyme (laccase) has three distinct domains (each colored differently). ribbon diagram of laccase The amino acid chain of hemocyanin can be represented like this: residues forming “green” domain residues forming “blue” domain residue 394 residue 1 residues forming linker space-filling diagram of laccase Domains are functional elements of proteins. Some examples of biochemical functions of domains: • An enzymeʼs catalytic domain has the function of catalyzing the conversion of a reactant into a product. • A structural protein domain has the function of influencing the shape of a cell. • The binding domain of a transport protein has the function of carrying a ligand from one location to another. Ribbon diagrams of β-propeller proteins containing 4-8 blades, each made up of WD40 domains. (Jawad and Paoli 2002) Domain architectures of the different MAGUK classes. The membrane-associated guanylate kinases (MAGUK) are a superfamily of proteins. The MAGUKs are defined by their inclusion of PDZ, SH3 and GUK domains, although many of them also contain regions homologous of CaMKII, WW and L27 domains. de Mendoza et al. 2010 Domain architectures of the different MAGUK classes. The PDZ domain is a common structural domain of 80-90 aminoacids found in the signaling proteins of bacteria, yeast, plants, viruses and animals. Proteins containing PDZ domains play a key role in anchoring receptor proteins in the membrane to cytoskeletal components. de Mendoza et al. 2010 Domain architectures of the different MAGUK classes. The SH3 domain is a distinct motif that binds target proteins, including proteins associated with the actin cytoskeleton, through sequences containing proline and hydrophobic amino acids. de Mendoza et al. 2010 Domain architectures of the different MAGUK classes. MAGUK (membrane-associated guanylate kinase) The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain. de Mendoza et al. 2010 Guanylate kinase PDZ domain SH3 domain Why is it useful to identify motifs and domains in families of proteins? • To identify the functionally important residues and patterns in a given domain. • To predict the function of a new protein by comparing its sequence to the sequences of domains with known functions. • To evaluate evidence for patterns of homology among orthologs and paralogs. i.e. partial homology • To trace the evolution of protein function. New genes from parts of old genes But how could this happen? Structural modules: Domain origins: EGF domain epidermal growth factor (EGF) fibronectin finger domain fibronectin plasminogen kringle domain vit. K-dependent calcium-binding domain (osteocalcin) trypsin-like serine protease Mosaic proteins tissue plasminogen activator prothrombin urokinase Source: Sylvia Nagl plasminogen One way to make a new gene: Use the parts of old genes. When domains are repeatedly found in diverse proteins they are called structural modules. Long structural modules are likely homologous. (Shorter structural modules may be convergent.) Domains may contain motifs. Hemocyanin as an example: The “green” domain of hemocyanin contains this copper-binding motif: H-X3-H-X22-37-H location of copper-binding motif location of copper-binding motif A group of proteins that share a domain in common constitute a FAMILY. Family members are evolutionarily related (homologous) and their domains have sequence similarity. Family members can share a domain in common in a number of ways: protein 1 protein 2 domain x domain x domain x domain x domain x domain x domain x protein 1 protein 2 domain x protein 2 A domain may extend essentially across the length of a protein. Domains may contain highly related stretches of amino acids that form only a subset of each proteinʼs sequence. protein 1 A domain may be repeated within a single protein. (Figure 8.2 from Bioinformatics and Functional Genomics by J. Pevsner) How are domains identified and classified? (Ponting and Russell 2002) A. By sequence- or structure-based families related by common ancestry (homology). B. By function. C. By shared GC content. D. By numbers of duplication, losses, as determined by reconciliation of gene and species trees. • Identification and classification by function? • Most domain families contain representatives with different functions. Providing a standard definition of function is difficult. • Identification and classification of domains by sequence (homology or structure). Domains may contain motifs. Hemocyanin as an example: The “green” domain of hemocyanin contains this copper-binding motif: H-X3-H-X22-37-H location of copper-binding motif location of copper-binding motif How are motifs and domains identified in protein families? By aligning family members in a global multiple sequence alignment. Motifs and/or domains can then be identified as conserved regions of the alignment. Sometimes it is easy to align the sequences, and these conserved regions are obvious and can be identified “by eye.” Seq1 APIPPPDLKSCGVAHIDDKGTEVSY--SCCPPVPDDIDSVPYYKFPPMTKLR-IRPPAHA 57 Seq2 APAPPPDLSSCSIARINEN-QVVPY--SCCAPKPDDMEKVPYYKFPSMTKLR-VRQPAHE 56 Seq3 APVPIPDLTKCVI-P---PSGAPVP-INCCPPFSK--DIIDFKYP-SFEKLR-VRPAAQL 51 Seq4 SPISPPDLSKCVP-PSDLPSGTTPPNINCCPPYST--KITDFKFP-SNQPLR-VRQAAHL 55 Seq5 APIQAPDLGDCHQ-PVDVPATAPAI--NCCPTYSAGTVAVDFAPPPASSPLR-VRPAAHL 56 Seq6 APILAPDLSTCGP-PADLPASARPT--VCCPPYQS--TIIDFKLPPRSAPLR-VRPAAHL 54 Seq7 APIQAPDISKCG--TATVPDGVTPT--NCCPPVTT--KIIDFQLPSSGSPMR-TRPAAHL 53 Seq8 APIQAPEISKCVVPPADLPPGAVVD--NCCPPVAS--NIVDYKIP-VVTTMK-VRPAAHT 54 Seq9 APIL-PDVEKCTLSDALWDGSVGDH---CCPPPFDLNITKDFEFKNYHNHVKKVRRPAHK 56 :* *:: * **.. : :: * .*: Seq1 A--DEEYVAKYQLATSRMRELDK-DPFDPLGFKQQANIHCAYCNGAYKIGGK---ELQVH 111 Seq2 A--NEEYIAKYNLAISRMKDLDKTQPLNPIGFKQQANIHCAYCNGAYRIGGK---ELQVH 111 Seq3 V--DDDYFAKYNKALELMRALPDDDPRS---FSQQAKIHCAYCVGGYKQLGYPEIELSVH 106 Seq4 V--DNEFLEKYKKATELMKALPSNDPRN---FTQQANIHCAYCDGAYSQIGFPDLKLQVH 110 Seq5 A--DRAYLAKYERAVSLMKKLPADDPRS---FEQQWRVHCAYCDGAYDQVGFPGLEIQIH 111 Seq6 V--DADYLAKYKKAVELMRALPADDPRN---FVQQAKVHCAYCDGAYDQIGFPDLEIQIH 109 Seq7 V--SKEYLAKYKKAIELQKALPDDDPRS---FKQQANVHCTYCQGAYDQVGYTDLELQVH 108 Seq8 M--DKDAIAKFARAVDLMRALPGDDPRN---FYQQALVHCAYCNGGYDQVNFPDQEIQVH 109 Seq9 AYEDQEWLNDYKRAIAIMKSLPMSDPRS---HMQQARVHCAYCDGSYPVLGHNDTRLEVH 113 . . .: * : * :* . . ** :**:** *.* . .:.:* At the right is an example of a multiple Seq1 FSWLFFPFHRWYLYFYERILGSLINDPTFALPYWNWDHPKGMRIPPMFDREGSSLYDEKR 171 NSWLFFPFHRWYLYFHERIVGKFIDDPTFALPYWNWDHPKGMRFPAMYDREGTSLFDVTR 171 sequence alignment of a family of proteins. Seq2 Seq3 NSWLFLAFHRWYIYFYERILGSLINDPTFAIPFWNFDAPDGMQIPSIFTNPNSSLYDLKR 166 GSWLFFPFHRWYLYFYERILGSLINDPTFALPFWNYDAPDGMQLPTIYADKASPLYDELR 170 A conserved copper-binding motif is known Seq4 Seq5 SCWLFFPWHRMYLYFHERILGKLIGDETFALPFWNWDAPDGMSFPAMYANRWSPLYDPRR 171 Seq6 NSWLFFPWHRFYLYSNERILGKLIGDDTFALPFWNWDAPGGMQFPSIYTDPSSSLYDKLR 169 to exist in these proteins. Examine the Seq7 ASWLFLPFHRYYLYFNERILAKLIDDPTFALPYWAWDNPDGMYMPTIYASSPSSLYDEKR 168 Seq8 NSWLFFPFHRWYLYFYERILGKLIGDPSFGLPFWNWDNPGGMVLPDFLNDSTSSLYDSNR 169 alignment carefully– can you identify the Seq9 ASWLFPSFHRWYLYFYERILGKLINKPDFALPYWNWDHRDGMRIPEIFKEMDSPLFDPNR 173 region containing the motif? .*** .:** *:* ***:..:*.. *.:*:* :* ** :* : :.*:* * (See next slide for answer.) Seq1 NQNHRNGTIIDLGHFGKDVRTPQL-----Seq2 DQSHRNGAVIDLGFFGNEVETTQL-----Seq3 DSRHQPPRIIDLNYNKDTEDPGPNYPPSAE Seq4 NASHQPPTLIDLNFCDIGSDIDRN-----Seq5 NQAHLPPFPLDLDYSGTDTNIPKD-----Seq6 DAKHQPPTLIDLDYNGTDPTFSPE-----Seq7 NAKHLPPTVIDLDYDGTEPTIPDD-----Seq8 NQSHLPPVVVDLGYNGADTDVTDQ-----Seq9 NTNHLD-KMMNLSFVSDEEGSDVN----ED : * ::*.. The copper-binding motif is within the red box. It is located within a conserved section of sequence which is marked with a yellow box (note the “ * : . ” symbols below the alignment which indicate conserved residues). Sometimes (as in this example) it is easy to align family members and identify conserved regions that are likely to be important to the function of the protein. However, for distantly related sequences, it may be very difficult to even align the sequences properly, let alone detect conserved sequence patterns. These situations require the use of sensitive statistical methods. Seq1 APIPPPDLKSCGVAHIDDKGTEVSY--SCCPPVPDDIDSVPYYKFPPMTKLR-IRPPAHA 57 Seq2 APAPPPDLSSCSIARINEN-QVVPY--SCCAPKPDDMEKVPYYKFPSMTKLR-VRQPAHE 56 Seq3 APVPIPDLTKCVI-P---PSGAPVP-INCCPPFSK--DIIDFKYP-SFEKLR-VRPAAQL 51 Seq4 SPISPPDLSKCVP-PSDLPSGTTPPNINCCPPYST--KITDFKFP-SNQPLR-VRQAAHL 55 Seq5 APIQAPDLGDCHQ-PVDVPATAPAI--NCCPTYSAGTVAVDFAPPPASSPLR-VRPAAHL 56 Seq6 APILAPDLSTCGP-PADLPASARPT--VCCPPYQS--TIIDFKLPPRSAPLR-VRPAAHL 54 Seq7 APIQAPDISKCG--TATVPDGVTPT--NCCPPVTT--KIIDFQLPSSGSPMR-TRPAAHL 53 Seq8 APIQAPEISKCVVPPADLPPGAVVD--NCCPPVAS--NIVDYKIP-VVTTMK-VRPAAHT 54 Seq9 APIL-PDVEKCTLSDALWDGSVGDH---CCPPPFDLNITKDFEFKNYHNHVKKVRRPAHK 56 :* *:: * **.. : :: * .*: Seq1 A--DEEYVAKYQLATSRMRELDK-DPFDPLGFKQQANIHCAYCNGAYKIGGK---ELQVH 111 Seq2 A--NEEYIAKYNLAISRMKDLDKTQPLNPIGFKQQANIHCAYCNGAYRIGGK---ELQVH 111 Seq3 V--DDDYFAKYNKALELMRALPDDDPRS---FSQQAKIHCAYCVGGYKQLGYPEIELSVH 106 Seq4 V--DNEFLEKYKKATELMKALPSNDPRN---FTQQANIHCAYCDGAYSQIGFPDLKLQVH 110 Seq5 A--DRAYLAKYERAVSLMKKLPADDPRS---FEQQWRVHCAYCDGAYDQVGFPGLEIQIH 111 Seq6 V--DADYLAKYKKAVELMRALPADDPRN---FVQQAKVHCAYCDGAYDQIGFPDLEIQIH 109 Seq7 V--SKEYLAKYKKAIELQKALPDDDPRS---FKQQANVHCTYCQGAYDQVGYTDLELQVH 108 Seq8 M--DKDAIAKFARAVDLMRALPGDDPRN---FYQQALVHCAYCNGGYDQVNFPDQEIQVH 109 Seq9 AYEDQEWLNDYKRAIAIMKSLPMSDPRS---HMQQARVHCAYCDGSYPVLGHNDTRLEVH 113 . . .: * : * :* . . ** :**:** *.* . .:.:* Seq1 FSWLFFPFHRWYLYFYERILGSLINDPTFALPYWNWDHPKGMRIPPMFDREGSSLYDEKR 171 Seq2 NSWLFFPFHRWYLYFHERIVGKFIDDPTFALPYWNWDHPKGMRFPAMYDREGTSLFDVTR 171 Seq3 NSWLFLAFHRWYIYFYERILGSLINDPTFAIPFWNFDAPDGMQIPSIFTNPNSSLYDLKR 166 Seq4 GSWLFFPFHRWYLYFYERILGSLINDPTFALPFWNYDAPDGMQLPTIYADKASPLYDELR 170 Seq5 SCWLFFPWHRMYLYFHERILGKLIGDETFALPFWNWDAPDGMSFPAMYANRWSPLYDPRR 171 Seq6 NSWLFFPWHRFYLYSNERILGKLIGDDTFALPFWNWDAPGGMQFPSIYTDPSSSLYDKLR 169 Seq7 ASWLFLPFHRYYLYFNERILAKLIDDPTFALPYWAWDNPDGMYMPTIYASSPSSLYDEKR 168 Seq8 NSWLFFPFHRWYLYFYERILGKLIGDPSFGLPFWNWDNPGGMVLPDFLNDSTSSLYDSNR 169 Seq9 ASWLFPSFHRWYLYFYERILGKLINKPDFALPYWNWDHRDGMRIPEIFKEMDSPLFDPNR 173 .*** .:** *:* ***:..:*.. *.:*:* :* ** :* : :.*:* * Seq1 NQNHRNGTIIDLGHFGKDVRTPQL-----Seq2 DQSHRNGAVIDLGFFGNEVETTQL-----Seq3 DSRHQPPRIIDLNYNKDTEDPGPNYPPSAE Seq4 NASHQPPTLIDLNFCDIGSDIDRN-----Seq5 NQAHLPPFPLDLDYSGTDTNIPKD-----Seq6 DAKHQPPTLIDLDYNGTDPTFSPE-----Seq7 NAKHLPPTVIDLDYDGTEPTIPDD-----Seq8 NQSHLPPVVVDLGYNGADTDVTDQ-----Seq9 NTNHLD-KMMNLSFVSDEEGSDVN----ED : * ::*.. Protein motifs and domains are consensus sequence patterns. Motif– a short conserved sequence pattern; can be just a few amino acid residues, up to ~20. Y-X-Y and C-X4-C-X12-H-X3-H Domain– a longer conserved sequence pattern which adopts a particular three-dimensional structure and is an independent functional and structural unit; typically 40-700 residues. Example of a two-domain protein: This protein (troponin C) is composed of a single amino acid chain, but each half of the chain forms an independent structural and functional unit– a domain. NOTE: Many short motifs are NOT specific to a particular protein family. Thus, their occurrence does not indicate homology. Example: protein kinase C phosphorylation site has this 3-residue motif: S/T – X - R/K (S or T, followed by any residue, followed by R or K) This is a common motif that occurs in many unrelated proteins. These represent evolutionary convergence for common function. Motifs and domains are FUNCTIONAL elements of proteins. Some examples of biochemical functions of domains: • An enzymeʼs catalytic domain has the function of catalyzing the conversion of a reactant into a product. • A structural protein domain has the function of influencing the shape of a cell. • The binding domain of a transport protein has the function of carrying a ligand from one location to another. Some examples of the functions of motifs: • The Yʼs of this tyrosine motif have the function of interacting with specific residues of a protein to stabilize its structure. Y-X-Y • The Hʼs and Cʼs of this zinc finger motif have the function of binding zinc ions. C-X4-C-X12-H-X3-H How are motifs and domains in protein families represented? 1. Regular expressions/patterns A multiple sequence alignment is converted to a consensus sequence called a regular expression or pattern. Example: Multiple sequence alignment: seq1 GEW seq2 GTW seq3 GTY seq4 GRW seq5 GKW seq6 GAW ----------------------------Regular expression: G-X-[WY] (G, followed by any residue, followed by W or Y) Interpreting regular expressions: Example: E-X(2)-[FHM]-X(4)-{P}-L Interpretation: First residue of the pattern is E; followed by any 2 residues; followed by F, or H, or M; followed by any 4 residues; followed by any residue except P; followed by L. Limitations of regular expressions: They do not take into account sequence probability information about the multiple sequence alignment. For instance, in the above example, we donʼt know how often F, H, and M each occur at the 4th position in this motif. H may be much more common than F or M, but we have no way of knowing this from the regular expression. Which sequence does not contain the domain motif defined by the regular expression: AR[ND]C(2)E A. ARNCCE B. ARDCCE C. ARNDCE How are motifs and domains in protein families represented? • Regular expressions, • PSSMs, profiles, and profile hidden Markov models: numerical representations of a multiple sequence alignment that contain information about the probability of observing a specific residue at a given location in the alignment. Logogram Determining if a sequence of interest contains a motif or domain represented by a probabilistic model: We would use the profile to “scan” the new proteinʼs sequence: X X X X X X X X X X...àscore1 Calculate score for occurrence of motif beginning at residue 1 X X X X X X X X X X...àscore2 Calculate score for occurrence of motif beginning at residue 2 . . Continue scanning until end . of sequence is reached. Calculate score for occurrence ...X X X X X X X X X X àscoreN of motif at last possible position The highest scoring location is the most likely position of the motif/domain in the sequence. Databases of motifs and domains The following are databases of regular expressions, PSSMs, profiles, and/or profile HMMs derived from alignments of motifs and domains found in protein families. You can submit a protein sequence to any of these databases in order to determine if the sequence contains one of the motifs or domains represented in the database. Pfam (http://pfam.xfam.org) Uses profile HMMs. Two-part database: Pfam-A (curated) and Pfam-B (automatically generated). InterPro (http://www.ebi.ac.uk/interpro/) An integrated database designed to unify multiple databases, including PROSITE, Pfam, PRINTS, ProDom, SMART, and others. Note: searching InterPro may produce different results than searching the individual databases that are part of InterPro. CD-SEARCH (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) Uses profiles. Includes the SMART and Pfam databases. SMART (http://smart.embl-heidelberg.de/) Uses profile HMMs. Alignments of domains checked manually by curators. Identify putative function of protein query sequence with CDSearch tool RPS-BLAST output Drosophila_melanogaster_Discs_large_5_Q9VKG8 2d1dd0159d53e777a070d719964b0545 1916 Pfam PF00625 1903 T 21-10-2020 IPR008145 Guanylate kinase 1773 3.3E-11 Guanylate kinase/L-type calcium channel beta subunit Guanylate kinase PDZ domain SH3 domain Questions about your Gene Family (for your paper) • Does domain content vary between orthologs and paralogs? • Does domain content vary between cnidarians and bilaterians? • What does the domain content reveal about the potential functional roles of your proteins? • What does the domain content reveal about the evolution of functional roles? In Lab… • You will be using RPS-BLAST to search for domains in your sequences using PFAM_A HMMs • You will be visualizing changes in domain content on your phylogeny. Questions for you to look up about your gene family… • What domains do members of your gene family contain? • What is the function of each of these domains? • What are the PFAM accessions of these domains?
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and...


Anonymous
Just the thing I needed, saved me a lot of time.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags