Instructor: Mark Schmidt.

Instructor office hours: Tuesdays at 3-4pm (ICICS 146).

Tutorials (beginning September 11):

- Mondays from 5-6 (DMP 101).
- Tuesdays from 3:30-4:30 and 4:30-5:30 (DMP 201).
- Wednesdays from 9-10 and 10-11 (DMP 201).

Teaching Assistants: Clement Fung, Hashemi Hooman, Siyuan He, Tanner Johnson, Angad Kalra, Aaron Mishkin, Xin Bei She, Sharan Vaswani, Nasim Zolaktaf, Zainab Zolaktaf

TA office hours (all in Demco Learning Centre):

- Mondays 1-2 (Siyuan at Table 3).
- Tuesdays 2-3 (Aaron at Table 1).
- Wednesdays 2-3 (Hooman at Table 2).
- Thursdays 2-3 (Clement at Table 4, with Aaron on weeks when assignments are due).
- Fridays 10-11 (Angad at Table 2).

**Synopsis**: We introduce basic principles and techniques in the fields of data mining and machine learning. These are some of the key tools behind the emerging field of data science and the popularity of the `big data' buzzword. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We'll focus on many of the core data mining and machine learning technlogies, with motivating applications from a variety of disciplines.

**Registration**: Undergraduate and graduate students from any department are welcome to take the class. However, due to the high demand only UBC computer science majors can directly register for the course. For all other students, ** to enroll in the course you need to sign up for the wait list** (before September 14). Note that last year all students on the wait list were ultimately accepted into the course (but we did not have room for auditors.)

**Prerequisites**:

- Basic algorithms and data structures (CPSC 221, or both of CPSC 260 and EECE 320 as well as one of CPSC 210, EECE 201, or EECE 309).
- Linear algebra (one of MATH 152, 221, or 223).
- Probability (one of STAT 200, STAT 203, STAT 241, STAT 251, STAT 302, MATH 302, MATH 318, or BIOL 300).
- Multivariate calculus (one of MATH 200, 217, 226, 253, or 263).

**Textbook**: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

**Related Courses**:
Related courses in statistics include: STAT 305, STAT 306, STAT 406, STAT 460, STAT 461 (as well as EOSC 510). A discussion of the difference between CPSC 340 and these various STAT classes written by a former student (Geoff Roeder) is available here.

**Grading**: Assignments 30%, Midterm 20%, Final 50%.

Piazza for course-related questions.

- Data representation and summarization.
- Supervised learning with frequencies and distances.
- Data clustering, outlier detection, and association rules.
- Linear prediction, regularization, and kernels.
- Latent-factor models and collaborative filtering.
- Neural networks and deep learning.

- Machine Learning and Data Mining (UBC 2012)
- Machine Learning (Stanford)
- Introduction to Machine Learning (Alberta - Schuurmans)
- Practical Machine Learning (Berkeley)
- Machine Learning (MIT)
- Machine Learning (CMU)
- Course in Machine Learning (Maryland)
- Principals of Knowledge Discovery in Data (Alberta)
- Mining Massive Data Sets (Stanford)
- Data Mining (CMU)

Mark Schmidt > Courses > CPSC 340