About the course
The analysis of data (DNA, music, images, video, news, blogs, medical records,
software, computer game logs, multimedia, social networks, environmental signals) is an important frontier in computer
science. This frontier is expanding vastly thanks to new developments in
mathematical modelling, algorithms, data management and computing infrastucture. It is having a profound impact
not only in science and medicine, but also
in e-commerce, marketing and business in general. Inference and learning with massive datasets is the key
ingredient of the intelligent machines of the future.
This course will provide an introduction to this exciting growing field. It will
teach the basic
principles and skills required for analysing data in a principled way: finding statistical patterns,
dimensionality reduction, clustering, classification and prediction. Students will also have the
opportunity of learning Python, a widely used programming language.
Assignments: 30%
Midterm 1: 20%. (Wed Feb 8)
Midterm 2: 20%. (Wed March 21)
Final: 30% (April 18, 3:30pm)
There will also be a special research project/competition for a few bonus marks. Details TBA.
The instructor has the right to change the marking scheme under
reasonable circumstances.
Assignments will involve both written and python programming problems.
All assignments are due on the specified date at 4pm. They are to be handed in at the classroom where the lecture takes place.
Missing the deadline from Friday to Monday costs 40%, from Monday to Wednesday 30% and from Wednesday to Friday 30%. For example, if someone misses a Friday deadline and hands in the homework on the subsequent Wednesday, the penalty is 70%. Late homeworks will be marked only after the homeworks that were handed in in a timely fashion have been marked. Hence, expect a delay in receiving your mark if you handed in late.
Messy homeworks will be penalized - it is your responsibility to ensure that the material is presented in a clear written form. All pseudocode must be handed in. Please don't forget to add your name and student number. Please staple your homework.
Academic honesty is important. If you find the answer to the homework on a website, book, etc. please acknowledge this in the front page of your homework. You will not be penalized. We like people who acknowledge the source and we don't mind if you seek help from friends to solve the homework problems. However you must ensure that you understand what you are doing, otherwise you're missing on the learning experience. Not doing the homeworks will also impact your performance on the exams.
Logistics
Time: Mon Wed Fri 4:00pm-5:00pm Location: Hugh Dempster Pavilion 110 Instructor: Nando de Freitas (nando@cs) TAs: Matt Hoffman (hoffmanm@cs), Bobak Shahriari (bshahr@cs) and Nathan Tomer (ntomer@cs) Tutorial 1 (Nathan): Mon 11:00-12:00 ( Hugh Dempster Pavilion 101) Tutorial 2 (Matt): Mon 3:00pm-4:00pm ( MacMillan 160 ) Tutorial 3 (Bobak): Fri 3:00pm-4:00pm ( Hugh Dempster Pavilion 101) Office hours (Nando): Fri 1:00pm-3:00pm (ICICS 146) Online discussion: cpsc340 google groupGrading
Assignments
RECOMMENDED READING :
- The machine learning book of Hastie, Tibshirani and Friedman is now online: The elements of statistical learning.
- Chapters 14,15 and 20 of the artificial intelligence book Stuart Russell and Peter Norvig are strongly recommended for this course. I'll provide partial photocopies of chapters 14 and 15 in class. Chapter 20 is available online.
- For graphical models and Beta-Bernoulli models, I recommend A Tutorial on Learning with Bayesian Networks David Heckerman.
- Kevin Murphy has compiled a nice page about Bayesian learning.
- Wikipedia tutorial on the: SVD
- The following handout should help you with linear algebra revision: PDF
USEFUL LINKS :
- Machine learning video lectures
- Why stats: NYTimes article
- A video lecture about python's package matplotlib
- The machine learning course of Andrew Ng is available in youtube and iTunes. It is strongly recommended.