Choose a protein from your textbook. Write the NAME OF THE PROTEIN, THE BOOK CHAPTER, AND
THE PAGE NUMBER AND/OR SECTION and turn in a hard copy to me by MONDAY OCT. 14, 2019.
Obtain FIVE different sequences (from five different organisms) from EACH of the following programs. In your
presentation, discuss the differences and similarities between these two sites (ease of use, differences and
similarities in presentation, organization, etc., of data (i.e., accession number, species name [whether common or
scientific], etc) and what the data in the first line means.
• To use Uniprot: At the top of the home page enter the name of your protein
(under query; make sure that to the left of the search box it reads ‘UniprotKB’)
and click 'search'.
• Once the list populates, determine if you need to make your search more
specific. (If you have more than 100 results, be more specific about how you
search: ex. Hemoglobin vs. HBG1 Gamma A. Both are hemoglobin, however,
Gamma A is a specific subunit sequence from one of the chains.)
• Select 5 DIFFERENT organisms from the list with the SAME protein name as
you searched by clicking on the radio button on the left.
• After you choose your five proteins, at the top of the list hit Add to Basket. You
will immediately see the five proteins show up in the basket in the upper right
• Click on the basket to open it, select all five proteins, and click download.
Another window will pop up. Click go. Be sure that the box reads FASTA
(canonical). Your proteins will show up in a Notepad document. Each sequence
will begin with > sp. Space between sequences, name, and save this document as
a .txt file.
An example of Uniprot data:
>sp|P06836|NEUM_BOVIN Neuromodulin OS=Bos taurus GN=GAP43 PE=1 SV=3
The accession number, (P06836), partial protein and species names (NEUM_BOVIN), species (Bos
taurus), and protein name (GAP43) are listed on the first line (point this out in your presentation)
To use NCBI: On the home page (where it says 'All databases'), select ‘Protein’
from the drop menu.
• Type the name of your protein (note: Use the same name you searched in
Uniprot) in the query box. You may also search using the accession number.
You should find the EXACT SAME FIVE organisms you found in Uniprot in
the list. These databases are linked. You are confirming they match and the
sequences you are utilizing are verified by both sources. Match the accession
number if necessary.
• After you have located and selected each of them, select FASTA for each of
them. This must be done one by one. As you open each FASTA for each
organism, highlight the entire sequence.
>gi|37781182|gb|AAO60065.1| GAP43 [Bos taurus]
Notice the accession numbers are also copied. Genbank accession number
(37781182), RefSeq accession number (AAO60065.1), protein name (GAP43),
and species name (Bos taurus) are listed on the top line.
• Open a new notepad file (same as previously) and copy each FASTA into the
file. Save all 5 complete sequences into the same .txt notepad file and save under
another file name.
• You should now have 2 different notepad files-one entirely from Uniprot and
one entirely from NCBI.
• You will upload each of these files into one of the programs below.
Do a multiple sequence alignment using Clustal omega AND one of the following programs
(either a, b, or c below) from the internet (Use default parameters in the programs). Before
doing your multiple sequence alignment, delete the accession numbers and other identifiers
and change the name of your organism to the common name (for example, change Homo
sapien to human). In your presentation compare and contrast how well the two different
programs align the five sequences.
Procedure: Multiple Sequence Alignment
The above programs are all similar: I am highlighting Clustal Omega only for these instructions but
make sure you also do MSA, MAFFT, or MUSCLE in addition to Clustal Omega.
Access the Clustal Omega homepage, (https://www.ebi.ac.uk/Tools/msa/clustalo)
Select browse then locate the 1st notepad (.txt file) you created from Uniprot. (I
find it easiest if you save all files to either a specific folder or simply to your
• Once the file is in the browse box, scroll to the bottom of the page and select
• A new window will open as the data is being processed. Then you will see your
first alignment file. Bookmark this page! Make sure that you label the
bookmark so that you can distinguish that this is the alignment generated from
the Uniprot search.
• Open another browser window or new tab and repeat the step above for your
other text file. Both of these files should be identical but if they are not make
note of the differences in the chains. You can print each of them and compare
differences and include your findings in your presentation. However, more than
likely they will be the exact same sequences. You have now verified your
sources for the alignment.
• You will need to bookmark both pages as these screenshots will be slides in your
• DURING YOUR PRESENTATION, DISCUSS THE MAJOR SIMILIARITIES
AND DIFFERENCES BETWEEN THE TWO PROGRAMS. POINT OUT THE
SIMILARITIES AND DIFFERENCES IN YOUR PROTEINS BY USING THE
*, ., AND : SYMBOLS AT THE BOTTOM OF EACH ROW OF SEQUENCES.
WHAT DO THE DASHES MEAN?
AT THIS POINT YOU HAVE TWO ALIGNMENT PAGES BOOKMARKED FROM
CLUSTAL OMEGA AND ONE OF THE OTHER PROGRAMS ABOVE. YOU ALSO
HAVE 2 SAVED TEXT FILES (NOTEPAD FILES) OF YOUR SEQUENCES READILY
FROM THIS POINT FORWARD, USE ONLY ONE OF THE MULTIPLE SEQUENCE
ALIGNMENTS FOR THE FOLLOWING PROGRAMS.
Find domains using GeneDoc (can be downloaded free from the internet). In your
presentation, discuss the similarities and differences in the domains across species. Use the
information from ClustalW2 or one of the other multiple sequence alignment programs but
Access (http://www.nrbsc.org/gfx/genedoc/) and scroll down to the download
link. Select it and save the file. Install the program on your PC (the program
CANNOT be installed on University computers!). If you cannot install
GeneDoc, come to my lab and use the lab computer.
• Once installation is complete, open the program. (A screen will open that is
almost entirely blank.
• At the top, select file.
• A drop menu will appear. Select import.
• A small window will appear. The default settings should be FASTA and FILE.
If not, select those options and select IMPORT.
• Another file load window will appear. Select your text file from earlier and
• The alignment should populate almost immediately on your screen. Hit done so
that the file box will close.
• At the top of the GeneDoc screen select the icon labeled 'C'.
• Another drop box will appear.
Select shading tab.
Looking at percentage of conservation (shading levels),
background colors [ choose 'fore' and 'back' to change colors]
This is your GeneDoc alignment file. Save the file to your desktop or folder you have saved the text
files in. You will need this screenshot for your presentation.
5. Search for protein domains. Copy and paste the following html:
https://www.ncbi.nlm.nih.gov/Structure/cdd/docs/cdd_search.html . Insert one of your protein Fasta
sequences into the text box and click ‘submit’.
Q#1 - >rat ((Local ID))
Show functional sites
These are your domains. If you click on the box, there will also be an explanation of what each
domain does. Include the explanation in your presentation.
6. Generate a score table
How to generate a scores table of your organism's protein alignments:
• Return to the Clustal Omega page you bookmarked earlier.
• Select the results summary tab
• Scroll down and click on ‘Percent Identity Matrix’.
• You should see the scores table on this screen.
• Bookmark this page. (Name it scores table.) You will need this screenshot for
• IN YOUR PRESENTATION, USE THE NUMBERS TO COMPARE HOW
CLOSELY (OR NOT) RELATED YOUR PROTEINS ARE.
Build a phylogenetic tree of your species.
Return to the Clustal Omega page you bookmarked earlier.
Select ‘Phylogenetic Tree’.
Bookmark this page. (Name it scores table.) You will need this screenshot for
In your presentation, discuss the relationship among the five species regarding how closely
(or not) they are related (based on % identity) through evolution. Describe which one of the
five species is the outgroup (if any) and define what an outgroup is. Do you have sister
and/or basal taxa? Discuss the answers to these questions in your presentation.
Build a three-dimensional model using PDB.org or another program.
LABEL AREAS OF INTEREST on your 3-D model (This can be done using paint
Point out the location of the active site, secondary structures, and other domains
in your presentation.
How to put together all of the screenshots:
• Open each of the above bookmarked windows. This process must be done one
by one. So select the first window you bookmarked.
• Once you have the screen up, make sure that it is maximized to take up your
• Select the Prnt Scrn key on your keyboard.
• Now open Microsoft Paint or similar program.
• Hit Ctrl+V or right click and hit paste or select the paste option from the Paint
• The screenshot should appear.
• Hit the select button from the paint menu if you need to crop the image.
• Once you have surrounded the portion you need to crop, right click and select
• Only the part of the screenshot you wish to include remains.
• Hit CTRL+S and save the file (type will be .png by default and is fine) by the
title and slide number you want to use. This will make it easier for you to keep
the screenshots in the correct order for your presentation when you start
inserting them. Ex. 1SearchResultsUniprot, 2SearchResultsNCBI,
3AlignmentFileClustal, and so forth.
Present your findings as a Powerpoint presentation (not less than 5 minutes nor more than 6
minutes). For each program you use or each step that you do you WILL NEED TO SHOW
THE DATA ON A SLIDE AND add the source to your references. In addition, you will
need to have a title slide, an introduction slide (these come before your data slides), a
summary slide, and a slide with references (last slide). You should have a total of 15-16
slides. At A minimum, your title slide should have the name of the course, the name of the
protein, and your name. Your introduction slide(s) should give some background about the
protein, including the history of the protein, who or how it was discovered or isolated, its
function, its molecular weight, its cellular location, and any other characteristics about the
protein that are pertinent to your discussion of the protein.
10. ORDER OF SLIDES (you will be graded on this specific order!):
Title slide (should contain information listed in the preceding paragraph)
Introduction slide(s) (should contain information requested above; no more than 2 slides;
information should be in bullet points—do not use long paragraphs)
NCBI data of sequences (do not show selection page; show the sequences)
Uniprot date of sequences (do not show selection page; show the sequences)
ClustalW2 multiple sequence alignment
Other multiple sequence alignment program data
Three dimensional model
Summary of research (not what you learned but what you discovered about your protein)
References (include ALL programs used and data sited)
How to insert screenshots into your Powerpoint presentation:
• Open Powerpoint
• Select insert
• Select picture
• Select your file
• You may need to adjust the size of the image to make it more easily viewable,
or even re-crop it to another size in paint.
This project cannot be done at the last minute. I will be available for questions throughout the
remainder of the semester. If you need help, I will be available after class until fall break. After
that, you’re on your own! You will be graded according to how well you follow instructions
AND how completely you do your research, organize it according to the instructions, and
present your data!
Glycophorin A : GYPA
Malcolm Finlay/Cell Biology 2110/Bioinformatics
Glycophorin A is the major intrinsic membrane protein of the erythrocyte.
It is a single-pass transmembrane glycoprotein and is expressed on
mature erythrocytes and erythroid precursor cells.
It is one of the major sialoglycoprotein expressed on human red blood
It bears the antigenic determinants for the MN blood group
The molecular weight is 16,331 Da, which stands for Dalton, the unit of
It is a component of the cell and plasma membrane
Homo Sapiens - Human
Mus Musculus - Mouse
Canis Lupus Familiaris - Dog
Sus Scrofa - Pig
Pan Troglodytes - Chimpanzee
Programs used to find data sequences
MAFFT & MUSCLE
NCBI Sequence Alignment
Uniprot Sequence Alignment
Clustal Omega Sequence Alignment
MAFFT Sequence Alignment
MUSCLE Sequence Alignment
Type to enter a caption.
Type to enter a caption.
Type to enter a caption.
Type to enter a caption.
Its one of the major sialoglycoproteins of human erythrocytes that carry MN blood group
antigens and acts as ligands for viruses, bacteria, and parasites. It is very important for
cell and membrane function and without it, not only could the processes of the membrane
could not be accomplished but the shape and deformability of the human red blood cell
membrane (RBCM) could not be maintained.
European Bioinformatics InstituteProtein Information ResourceSIB Swiss Institute of
Bioinformatics. “Glycophorin-A.” GYPA - Glycophorin-A Precursor - Homo Sapiens
(Human) - GYPA Gene & Protein, 28 Feb. 2018, www.uniprot.org/uniprot/P02724.
“CD235a (Glycophorin A) Antibodies, Human.” Primary Antibodies - Antibodies - MACS
Flow Cytometry - Products - Miltenyi Biotec, www.miltenyibiotec.com/USen/products/macs-flow-cytometry/antibodies/primary-antibodies/cd235a-glycophorin-aantibodies-human-rea175-1-11.html.
CALIPHO Team - SIB - Swiss Institute of Bioinformatics. NeXtProt the Human Protein
Database, Platform and Annotation Knowledge Base,
Purchase answer to see full