CPSC 536A - Bioinformatics Course, Spring 2001

[General Info] · [Course Outline] · [Course Projects] · [Grading] · [Assignments & Quizzes] · [Resources]

General / Administrative Information

Course number / title: 536A (201): Topics in Algorithms and Complexity - Bioinformatics
Catalogue number: 40449

Time: Tue 10:00-11:30, Thu 13:00-14:30
Room: FSC 1001 (Tue), CISR 304 (Thu)

Instructors: Anne Condon / Holger Hoos

[Current official information on graduate courses in 2000/01 term 2 can be found here]

Course Description:

Bioinformatics involves the application of computational methods in order to address problems in molecular biology. This course will provide a graduate introduction to algorithms and their applications in bioinformatics. Topics in molecular biology that will motivate the algorithmic content of the course include: sequence alignment, phylogenetic tree reconstruction, prediction of RNA and protein structure, gene finding and sequence annotation, gene expression, and biomolecular computing.

This is an interdisciplinary course, and the goal is to involve students who have either a strong computer science background or a strong background in molecular biology (such as students in the genetics graduate program), but not necessarily both. It is understood that students from these groups will have different skills and experience, and course lectures and assignments will take this into account. However, all students should already have a solid background in computer programming and should be comfortable with mathematical reasoning, such as can be obtained in a college level course in Mathematics or Statistics. Background in discrete mathematics or in probability theory is especially relevant to the course content.

Class assignments will familiarize students with biological data and tools for understanding this data and will help students gain a solid understanding of principles for design and analysis of algorithms. Some assignments will be involve use and extension of software tools, and others will involve written studies of algorithms and their analysis. Class projects will bring together students with different backgrounds to apply ideas from the course to a problem in molecular biology.

Course Outline (Draft)

Module 1: Introduction and Basics (hh; 1.5wk)

Module 2: Sequence Alignment (hh+ac; 2-2.5wk)

Resources: [BSA:2]

Module 3: Phylogenetic Trees (ac(2)+hh(1); 1.5wk)


[CKR] has notes from a lecture by Felsenstein, a poineer of maximum likelihood methods for phylogeny. Many of the other web-based course materials also include notes on phylogeny, with [CRS] being quite comprehensive.

[Fe78] Felsenstein J. Inferring Phylogenies. (In print yet?)

[Fe78] Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading, Syst. Zoology, 27:401-410.

[H92] Hillis, D., Bull, J.J., White, M.E., Badgett, M.R., and Molineux, I.J.. Experimental phylogenies: generation of a known phylogeny, Science 255 (1992), 589-592.

[H97] Hillis, D., Inferring Complex Phylogenies, Nature, Vol. 383, pp 130-131.

[Mo00] Moret, B.E., Wang, L., Warnow, T., and Wyman, S.K., Highly Accurate Reconstruction of Phylogenies from Gene Order Data, Manuscript.

[SO] Swofford D. L. and Olsen, G. J. Phylogeny Reconstruction. Chapter 11, pp 411-501 in D.M. Hillis and C. Moritz, eds. Molecular Systematics, Sinauer Associates, Sunderland, Massachisetts.

[Wa96] Warnow, T. Some Combinatorial Optimization Problems in Phylogenetics. Proc. International Colloquium on Combinatorics and Graph Theory, Balatonlelle, Hungary (1996), in Vol. 7 of Bolyai Society Mathematical Studies, A. Gyarfas, L. Lovasz, and L.A. Szekely, eds., Budapest, 363-413, 1999.

[Wa01] St. John, K., Warnow, T., Moret, B.M.E., and Vawter, L. Performance Study of Phylogenetic Methods: (Unweighted) Quartet Methods and Neighbor-Joining. To appear.

[Wo98] Woese, C. The Universal Ancestor. Proc. Natl. Acad. Sci. USA, Vol. 95, pp 6854-6859, June 1998.

Module 4: RNA and Protein Structure (ac+hh; 2wk)


[CKR] Karp-Ruzzo lecture notes again.

[I00] Isambert, H. and Siggia, E.D. Modeling RNA folding paths with pseudoknots: application to hepatitis delta virus ribozyme. Proc. Nat. Acad. Sci. 97L12, pp 6515-6250, June 6, 2000.

[L00] Lyngso, R.B. and Pedersen, C.N.S., Pseudoknow Prediction in Energy Based Models, RECOMB 2000.

[Ho94] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoffer, M. Tacker, and P.Schuster, ``Fast folding and Comparison of RNA Secondary Structures.'' Monatshefte f. Chemie, Volume 125, 1994. pages 167-188. See also http://www.tbi.univie.ac.at/\~ivo/RNA.

[McC90] L.S. McCaskill, ``The Equilibrium Partition Function and Base Pair Binding Probabilities for RNA Secondary Structure,'' Biopolymers, Vol 29, 1990, pages 1105-1119.

[R99] E. Rivas, S.Eddy, ``A dynamic Programming Algorithm for RNA Structure Prediction including Pseudoknots,'' Journal of Molecular Biology, vol 285. 1999. pages 2053-2068.

[Z81] M. Zuker and P.Stielger,''Optimal computer folding of large RNA sequences usingthermodynamics and auxiliary information,'' Nucleic Acids Res 9,(1981), pages 133-148. See also http://www.ibc.wustl.edu/\~zuker/rna/.

Module 5: Gene Finding and Sequence Annotation (hh; 1-1.5wk)

Module 6: Gene Expression Analysis (ac; 1wk)


Shamir and Sharon, Algorithmic Approaches to Clustering Gene Expression Data (available at
http://www.math.tau.ac.il/~rshamir/papers.html - scroll to the end of that page to find the paper).

Ben-Dor et al. Tissue Classification with Gene Expression Profiles, Recomb 2000.

Golub et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531-537, 1999.

M.J. van der Laan and J. Bryan. Gene Expression Analysis with the Parametric Bootstrap, Biostatistics, 2001 (available at http://www.stat.berkeley.edu/users/jenny/pub.html).

S. Brenner et al., Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays, Nature Biotechnology, Vol. 18, June, 2000.

Module 7: Biomolecular Computing (ac; 2wk)


Erik Winfree's web page: see Erik's thesis and STOC 2000 paper for more on self-assembly computation.

Slides from first lecture (powerpoint format)

Other Topics (not to be covered this time):

Algorithms / approaches to be covered:

Course Projects

Projects in Progress

If you haven't submitted the HTML version of your (revised) project proposal yet, please do so ASAP via e-mail to hoos@cs.ubc.ca, subject 'CPSC 536A Course Projects'. The prefered method is for your project to setup a webpage which contains the proposal, reports, and relevant links (see the snoRNA group's page for an example).

General Information

Students are expected to select and complete a course project according to the following timetable:

01/09 project descriptions are now available
01/16 students select project (see below)
01/25 students submit project proposal
03/06 students submit progress report
04/03 students submit final report
04/18, 9:30-16:30 CPSC53A Bioinformatics Mini-workshop (project presentations)
04/19, 13:00-16:00 BETA-Lab Open House - all CPSC536A participants are cordially invited!

Students should work in groups of two or three, preferably combining different background and expertise in each team. The project proposals and reports will be reviewed and evaluatated according to standard criteria for research proposals / research papers.

As we intend to combine the final reports into a proceedings volume for the mini-workshop, they need to be formatted uniformly. Please follow the layout and formatting of this sample text as closely as possible. We recommend to use LaTeX for preparing the final document (in which case you can use the source of the sample text as a template for your report). Final reports should be submitted as PostScript or PDF files via e-mail to hoos@cs.ubc.ca.

At least one week before the workshop, the authors will receive reviews for their reports along with instructions for preparing the camera-ready copies for the mini-workshop proceedings (which mainly consist of the page numbers to be used).


Final grades will be determined from the following components:

Assignments & Quizzes

01/04 - due 01/09 Reading assignment: Douglas Hofstadter, The Genetic Code: Arbitrary? (available from the Reading Room)
01/11 - due 01/16 Reading assignment: Baldi and Brunak, Bioinformatics - a machine learning approach, Chapter 1 (available from the Reading Room)
Remark: Sections 1.4.4 and parts of 1.4.5 are not essential.
01/16 - due 01/23 Assignment 1 (covers Module 1)
01/18 - due 01/23 Reading assignment: Durbin, Eddy, Krogh, Mitchison: Biological sequence analysis, Chapter 2, Section 2.3 (available from the Reading Room)
02/01 - due 02/08 Assignment 2 (covers Module 2)
02/09 Solutions to Quiz 1
02/01 - due 02/08 Assignment 3 (covers Module 3)
03/06 - due 03/13 Reading assignment:
1) Baldi and Brunak, Bioinformatics - a machine learning approach, Sections 5.1, 6.2 (available from the Reading Room)
2) Schulze-Kremer, Genetic Algorithms and Protein Folding. (available electronically at http://www.techfak.uni-bielefeld.de/bcd/Curric/ProtEn/proten.html
03/20 - due 03/27 Assignment 4 (covers Module 4)
03/22 - due 03/27 Reading assignment:
1) Shamir and Sharon, Algorithmic Approaches to Clustering Gene Expression Data (available from the Reading Room and also electronically at http://www.math.tau.ac.il/~rshamir/papers.html - scroll to the end of that page to find the paper.


Lecture Notes:

Lecture notes are also available as hardcopies in the CISCR Reading Room


Papers / online articles:

Courses / course materials on the web:

Resource lists on the web

Bioinformatics tools / software / databases

last update 01/04/10, hh