Course Syllabus

About

The course will primarily cover computational methods that are used to process and analyze high-throughput genomic sequencing data, with a focus on developing probabilistic models for single-cell genomics. Some topics include statistical inference, probabilistic graphical models, latent variable models, deep latent variable models, and their applications in single-cell genomics.

Lectures

Lectures will be held Monday/Wednesday from 3:00-4:30pm PST. Attendance is expected at all lectures, as course participation forms part of the final mark.

Textbooks

AOS: Larry Wasserman. All of statistics

MML: Marc Deisenroth, A.  Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning

PML: Kevin Murphy. Probabilistic machine learning

ECB: Alberts Bruce, Dennis Bray, Karen Hopkin, Alexander D. Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Essential cell biology

MBC: Alberts Bruce, Dennis Bray, Karen Hopkin, Alexander D. Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Molecular biology of the cell

BSA: Richard Durbin, Sean Eddy, Anders Krogh, Graeme Mitchison. Biological Sequence Analysis

Lauren M. Sompayrac. How the immune system works

 

Homework

 

Schedule

 

Lecture Date Topic Slides Reading Paper
Others

Scribe

1 07-09, 2022

Introduction

Course logistics

Vector, Norm, Matrix

SVD, Single-cell RNA-sequencing

 lec01-intro.pdf MML Ch.1-4  

Actually, you can use Overleaf to write latex 

https://www.overleaf.com/latex/templates

 

Using Rstudio for data analysis:

https://education.rstudio.com/learn/beginner/ 

 CPSC545-lec01-scribe.pdf

2 12-09-2022

Cells

Cells, Nucleus, Chromosomes, DNA, RNA, Protein, The Central Dogma

scRNA-seq, Ambient RNA, Droplets, Empty Droplets, Doublets, Cell capture rates

UMI, 3'tag

 

lec02-cell.pdf

ECB Ch. 1,5,7 CPSC545-lec02-scribe.pdf
3 14-09-2022

Probability Primer

Random Experiments

Random Variables

CDF, PDF, PMF

Important Continus and Discrete R.V.s

Condition/Independence/Bayes

Expectation, Variance, Covariance

Conditional Expectations

 

lec03-probability-primer.pdf

AOS Ch. 1-3     lec03-probability-primer-scribe.pdf 
  19/09-2022

No class

 

       
4 21-09-2022

Statistical Inference 

Bayesian and Frequentist Inference

Bayesian+MAP+ML

Beta-Binomial Model for Variant Detection

Conjugate Priors

lec04-statistical-inference.pdf AOS Ch. 6, Ch. 11.1-2

SNVMix

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832826/

JointSNVMix

https://pubmed.ncbi.nlm.nih.gov/22285562/

 

5 26-09-2022

Generalized Linear Models

Linear Regression

Logistic Regression

Multi-class Logistic Regression

Poisson Regression

Negative Binomial Regression

GLM and the Exponential Family

PML Ch. 11.1-11.2.2

Ch.12

 

 

 

6

28-09-2022

Latent Variable Models &

Probabilistic Graphical Models 

Joint Distributions / 

Global and Local Latent Variables

Random Variables/Fixed Parameters/Plate Notations

Conditional Independence/D-separation

Markov Blankets

PML Ch. 3.6

Data Analysis with Latent Variable Models

http://www.cs.columbia.edu/~blei/papers/Blei2014b.pdf

 

Optional: xseq

https://www.nature.com/articles/ncomms9554

7 03-10-2022

Finite Mixture Models

Mixture of Binomial

Monte Carlo Integration, Importance Sampling, Rejection Sampling, Gibbs Sampling, MCMC

Bayesian Mixture Models

PML Book2 Ch11-12

(Optional)

SNVMix

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832826/

 

8 05-10-2022

Finite Mixture Models II

EM Algorithm for Missing Data Problems

Properties of the EM Algorithm

9 10-10-2022 Probabilistic Principal Component Analysis
10 12-10-2022 Principal Component Analysis II
11 17-10-2022 Variational Autoencoders
12 19-10-2022 Variational Autoencoders II
13 24-10-2022

 Variational Autoencoders III

14 26-10-2022

HMM

Viterbi

15 31-10-2022

HHM II

EM for Learning Parameters

Sequence Alignment

Profile HMM

16 02-11-2022
Topic Model
17 07-11-2022 Probabilistic Matrix Factorization
09-11-2022 Midterm break, no class
18 14-11-2022 Graph Neural Networks
19 16-11-2022 Diffusion Models
20 21-11-2022 Causal Inference
21 23-11-2022 Guest Lectures
22 28-11-2022 Student Presentations
23 30-11-2022 Student Presentations
24 05-12-2022 Project Presentations
25 07-12-2022 Project Presentations

Grading

The marks for the course will be distributed as follows (note: the instructors reserve the right to modify the marking scheme at any time, although the final marking scheme should be fairly close to that given here): 

  • A project (60%)
    • A project proposal: 10%, a one page writeup of your proposed project 
    • Project writeup and presentation: 50%. Your will need to turn in your report (8 pages excluding references), code, and a presentation.  
  • Course participation (20%)
    • Each student will present and lead the discussion of at least one paper (10%)
    • Taking notes (for each lecture, one person needs to take notes and sends the notes to me, and I will post the notes to Canvas (10%)

  • Homework (20%)
    • Two homework assignments, submitted on Canvas, with math derivative, programming, and result interpretation

Participation

For course participation, each student will be responsible for leading the discussion of one paper in class. They will create a thread on Piazza one week before the paper presentation. Second, all other students will briefly post their thoughts on the paper in this thread. All posts must be submitted 24 hours before the paper presentation.

Final Project

The final project will form the major assessment in this course. The project can be done in teams of 1-2 people. The project should be a novel piece of research in the field of bioinformatic algorithms. Suitable topics would be the development of a new algorithm, theoretical result, computational method etc. The final project will be assessed in two ways. First, you will deliver a 25 minute presentation to the class with 5 minutes of question time. Second, you will deliver a written report (8 pages excluding references) in the style of bioinformatics journal article.

Course communication

Piazza will be used for course communication including announcements, questions about lectures and any other logistics. A link to the Piazza group can be found on the left of the Canvas page.

 

Course Summary:

Date Details Due