CPSC445/545 - Assignment 2 (covers Module 2)
released: Tue, 04/10/12; due: Tue, 04/10/19
NOTE: CPSC 545 students are only required to solve problems 1 and 2;
CPSC 445 students are expected to solve all problems.
1 Local Sequence Alignment [7 marks]
Here is the dynamic programming matrix resulting from a run of
the standard pairwise local sequence alignment algorithm
(with linear gap penalty d=8)
on the protein sequences DEWDEH and NDWEHK, using the
(a) Complete the table, by replacing the ?'s with the appropriate
numbers. [4 marks]
(b) From the table, give the optimal local alignment for these two
sequences and its score. [3 marks]
2 Multiple Sequence Alignment [10 marks]
Determine the sum-of-pairs scores for the following multiple sequence
alignment of DNA sequences, using the
scoring matrix in which a match gets a value of +4, a mismatch gets
a value of -1, and a (base,gap) pair gets a value of -2. (A
(gap,gap) pair gets a value of 0.) [3 marks]
GT - A
C - - A
(b) Align the following two alignments (i.e., profiles) of
protein sequences. Use the BLOSUM50 matrix and linear gap
penalties with d=-8. [4 marks]
Hint: Make sure you understand the description
of profile alignment in Durbin et al. (page 146-147)
R - CY
(c) Briefly explain the role of guide trees in progressive
multiple sequence alignment algorithms. What do the leaf and
internal nodes of a guide tree represent? [3 marks]
Hint: Review the section on progressive alignment methods in Durbin et al.z
3 Similarity Search using BLAST (Hands-on Problem) [CPSC 445 students only; 12 marks]
Run a Protein-Protein Blast search (BLASTP) at the
NCBI web site
in order to find proteins similar
to the Matrix glycoprotein of Human coronavirus OC43 in the SWISSPROT database;
the sequence of this protein in FASTA format is as follows:
>gi|267362|sp|Q01455|VME1_CVHOC E1 glycoprotein (Matrix glycoprotein) (Membrane glycoprotein)
Perform the search using the BLOSUM62 scoring matrix, with gap opening cost = 11, gap extension cost = 1, Expectation = 10 and word size = 3.
(a) Answer the following questions:
- What is the number of hits reported by BLASTP? [1 mark]
- What is the number of hits with a bit score between 80 and 200? [1 mark]
- Which types of organisms do the proteins from the previous question belong to?
(Hint: Use the taxonomy report link on the BLAST result page) [1 mark]
- Some of the these organisms are parasitic.
In which hosts are these organisms found?
(This can be easily inferred from the taxonomy report information) [1 mark]
- What type of disorders do the viruses cause whose proteins
got a score between 80 and 200 in this search? [1 mark]
- Give the alignment of the Human coronavirus matrix protein sequence with the SARS coronavirus sequence.
Specify the number of identical residues and % sequence identity,
the number of similar residues and % similarity (in BLAST terminology: % positive),
and the number and % of gaps. [1 mark]
(b) Are the scores
for the hits with bit scores between 80 and 200
significantly affected when using the PAM70 scoring matrix
instead of the BLOSUM62 matrix? [1 mark]
(c) Does the alignment between Human coronavirus and SARS proteins
change when using the PAM70 scoring matrix instead of the BLOSUM62 matrix?
If so, how? [1 mark]
(d) What does the expectation parameter mean?
What will happen if the expectation value is increased from
its default value of 10 to a 100? [2 marks]
(e) Based on the respective alignments,
would you say that the Human coronovirus sequence is very similar
to the sequence for the SARS matrix protein? [1 mark]
(f) Human coronavirus OC43 belongs to the group 2 of mammalian
coronaviruses. One of the BLAST hits in the search you have
performed is a matrix protein from a group 1 human coronavirus
(229E). Which of these two matrix proteins belonging to two
different groups of human coronaviruses is more similar to
SARS matrix protein? (Hint: you will have to perform another
BLAST search using the FASTA sequence of the group1 human
coronavirus protein; click on the web link for that protein
which will take you to its GenBank record.) [1 mark]
While cooperation between students - especially between CS and non-CS students
- is encouraged, each student is expected to work out the actual solutions
to the problems individually and hand in their own assignment.
In other words: help each other, but do not copy solutions.
Feel always free to contact Holger, Baharak, or Sanja if you feel you need further help than
can be provided by your fellow students.
- The assignment has to be handed in on the date it is due before or
at the beginning of class.
- This assignment should take you about 1.5-3 hours of work, if you have
good knowledge of the topics covered and did all reading assignments.
However, don't wait until the last minute relying on this estimate
- it might not apply to you (or anyone at all), you might need additional
time to consult the literature, ...