Data mining: What is it really?

Data mining is all about two things: automation (i.e. computers, programming, etc.) and inference (i.e. statistics, machine learning, etc.). In other words, given a huge amount of data, how can we build tools to learn from it automatically? There are very general tools that have been developed to answer these kinds of problems. In this class you’ll learn about them in a very hands on and practical way. You will actually build the tools you will need in the future when you “do” data mining.

This course is designed to give you the tools to implement and understand the kind of inference algorithms one might wish to use in data mining applications. For this reason the course could just as well be titled introductory statistical machine learning.

We will cover graphical models, inference in graphical models, sampling, variational inference, and then a raft of specific models for clustering, regression, and classification.

You can expect to learn not only what techniques are out there, but how to implement and extend them. You will be tested in this respect by being asked to complete challenging programming assignments and to use your gained knowledge at the end of the course to complete an interesting final project on a subject of your own personal interest.

Term: Spring 2012
Time: Tu-Th, 6:10pm-7:25pm
Location : TBA
Professor: Frank Wood
Email: fwood@stat.columbia.edu
Office:
Room 1017
School of Social Work
Office Hours:
Tu 7:25pm-8:25pm
420 Pupin
TAs:
Jingjing Zou
Email: jingjing@stat.columbia.edu
Hours:
W: 10:30am-12pm SSW 902 W: 6-8:30pm SSW 902