Course Syllabus

About

The course will primarily cover computational methods that are used to process and analyze high-throughput genomic sequencing data, with a focus on developing probabilistic models for single-cell genomics. Some topics include statistical inference, probabilistic graphical models, latent variable models, deep latent variable models, and their applications in single-cell genomics.

Lectures

Lectures will be held Monday/Wednesday from 3:00-4:30pm PST. Attendance is expected at all lectures, as course participation forms part of the final mark.

Textbooks

AOS: Larry Wasserman. All of statistics

MML: Marc Deisenroth, A.  Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning

PML: Kevin Murphy. Probabilistic machine learning

ECB: Alberts Bruce, Dennis Bray, Karen Hopkin, Alexander D. Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Essential cell biology

MBC: Alberts Bruce, Dennis Bray, Karen Hopkin, Alexander D. Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Molecular biology of the cell

BSA: Richard Durbin, Sean Eddy, Anders Krogh, Graeme Mitchison. Biological Sequence Analysis

Lauren M. Sompayrac. How the immune system works

 

Homework

 

Schedule

 

 

Date Topic Slides Reading Paper
Others
1 07-09, 2022

Introduction

Course logistics

Vector, Norm, Matrix

SVD, Single-cell RNA-sequencing

 lec01-intro.pdf MML Ch.1-4    
2 12-09-2022

Cells

Cells, Nucleus, Chromosomes, DNA, RNA, Protein, The Central Dogma

 

 

ECB Ch. 1
3 14-09-2022

Probability Primer

Random Experiments

Random Variables

CDF, PDF, PMF

Important Continus and Discrete R.V.s

Condition/Independence/Bayes

Expectation, Variance, Covariance

Conditional Expectations

 

 

AOS Ch. 1-3    
4 19-09-2022

Statistical Inference 

Bayesian and Frequentist Inference

MAP+ML

Beta-Binomial Model for Variant Detection

Conjugate Priors

AOS Ch. 6, Ch. 11.1-2

SNVMix

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832826/

JointSNVMix

https://pubmed.ncbi.nlm.nih.gov/22285562/

 

5 21-09-2022

Generalized Linear Models

Linear Regression

Logistic Regression

Multi-class Logistic Regression

Poisson Regression

Negative Binomial Regression

GLM and the Exponential Family

PML Ch. 12

 

 

6

26-09-2022

Latent Variable Models &

Probabilistic Graphical Models 

Joint Distributions / 

Global and Local Latent Variables

Random Variables/Fixed Parameters/Plate Notations

Conditional Independence/D-separation

Markov Blankets

PML Ch. 3.6

Data Analysis with Latent Variable Models

http://www.cs.columbia.edu/~blei/papers/Blei2014b.pdf

7 28-09-2022

Finite Mixture Models

Mixture of Binomial

Gipps Sampling

Bayesian Mixture Models

SNVMix

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832826/

8 03-10-2022

Finite Mixture Models II

EM Algorithm for Missing Data Problems

Properties of the EM Algorithm

9 05-10-2022 Probabilistic Principal Component Analysis
10 10-10-2022 Principal Component Analysis II
11 12-10-2022 Variational Autoencoders
12 17-10-2022 Variational Autoencoders II
13 19-10-2022

 Variational Autoencoders III

14 24-10-2022

HMM

Viterbi

15 26-10-2022

HHM II

EM for Learning Parameters

Sequence Alignment

Profile HMM

16 31-10-2022
Topic Model
17 02-11-2022 Probabilistic Matrix Factorization
18 07-11-2022 Graph Neural Networks
19 09-11-2022 Diffusion Models
20 14-11-2022 Causal Inference
21 16-11-2022 Guest Lectures
22 21-11-2022 Guest Lectures
23 23-11-2022 Student Presentations
24 28-11-2022 Student Presentations
25 30-11-2022 Student Presentations
26 05-12-2022 Project Presentations
27 07-12-2022 Project Presentations

Grading

The marks for the course will be distributed as follows (note: the instructors reserve the right to modify the marking scheme at any time, although the final marking scheme should be fairly close to that given here): 

  • A project (60%)
    • A project proposal: 10%, a one page writeup of your proposed project 
    • Project writeup and presentation: 50%. Your will need to turn in your report (8 pages excluding references), code, and a presentation.  
  • Course participation (20%)
    • Each student will present and lead the discussion of at least one paper (10%)
    • Taking notes (for each lecture, one person needs to take notes and sends the notes to me, and I will post the notes to Canvas (10%)

  • Homework (20%)
    • Two homework assignments, submitted on Canvas, with math derivative, programming, and result interpretation

Participation

For course participation, each student will be responsible for leading the discussion of one paper in class. They will create a thread on Piazza one week before the paper presentation. Second, all other students will briefly post their thoughts on the paper in this thread. All posts must be submitted 24 hours before the paper presentation.

Final Project

The final project will form the major assessment in this course. The project can be done in teams of 1-2 people. The project should be a novel piece of research in the field of bioinformatic algorithms. Suitable topics would be the development of a new algorithm, theoretical result, computational method etc. The final project will be assessed in two ways. First, you will deliver a 25 minute presentation to the class with 5 minutes of question time. Second, you will deliver a written report (8 pages excluding references) in the style of bioinformatics journal article.

Course communication

Piazza will be used for course communication including announcements, questions about lectures and any other logistics. A link to the Piazza group can be found on the left of the Canvas page.

 

Course Summary:

Date Details Due