CPSC 525: Course Outline and Reading List

Instructor: David Lowe
January-April 2014

Course home page: http://www.cs.ubc.ca/~lowe/525

Textbook: While most of the course is based on original research papers, we will also consult the following textbook by Richard Szeliski. It is available for free on-line, or can be purchased in printed form.

Computer Vision: Algorithms and Applications by Richard Szeliski


The following is a tentative list of topics and readings for the course. It will be changed and updated as the course proceeds.

Introduction

The first class will provide an overview of the computer vision field and its applications.
  • Read Chapter 1 of Szeliski's book for an introduction to computer vision and a brief history of the field.

Stereo vision

Topics: Epipolar geometry and rectification. Correlation and feature matching. Discussion of the first assignment. Belief propagation.
  • Pascal Fua, "A parallel stereo algorithm that produces dense depth maps and preserves image features," Machine Vision and Applications, 6 (1993), 35--49. [PDF]
  • D. Scharstein and R. Szeliski. "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," International Journal of Computer Vision, 47 (2002), pp. 7-42. [Web site with data and source code] [PDF]

  • Pedro Felzenszwalb and Daniel Huttenlocher, "Efficient Belief Propagation for Early Vision," Conference on Computer Vision and Pattern Recognition (CVPR), 2004. [Presentation] [Web site with source code] [PDF]

Image matching and recognition with invariant local features

Interest points. Rotation, scale, and illumination invariance. Image region descriptors. RANSAC. The Hough transform.
  • Section 4.1, Feature Detection and Matching, from Szeliski's book

  • David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. [PDF]

  • M. Calonder, V. Lepetit, C. Strecha, and P. Fua, "BRIEF: Binary Robust Independent Elementary Features," European Conference on Computer Vision (ECCV), 2010. [PDF]

  • Articles from Wikipedia: RANSAC; The Hough Transform;

Image registration and 3D reconstruction

Non-linear least-squares with Gauss-Newton. Levenberg-Marquardt. Robust solutions. Solving for 3D structure and camera pose. Dense surface reconstruction.

Matching and recognition in large datasets

Scaling recognition to large image collections. K-means clustering algorithm. K-d trees. Approximate nearest-neighbour matching in high-dimensional spaces. FLANN.
  • David Nister and Henrik Stewenius, "Scalable recognition with a vocabulary tree," Conference on Computer Vision and Pattern Recognition, 2006. [PDF]

  • Marius Muja and David G. Lowe, "Fast approximate nearest neighbors with automatic algorithm configuration," International Conference on Computer Vision Theory and Applications (VISAPP), 2009. [PDF]; [Source code]

  • Articles from Wikipedia: K-means clustering; K-d trees

Learning to recognize object categories

Face detection. The AdaBoost alogorithm. Learning generative and discriminative models. The bag-of-features approach versus learned geometry. Object segmentation from recognition.
  • Paul Viola and Michael Jones, "Rapid object detection using a boosted cascade of simple features," Conference on Computer Vision and Pattern Recognition, 2001, pp. 511-518. [PDF]
    For background on AdaBoost, read Freund and Schapire, "A short introduction to boosting," JJSAI, 1999. [PDF]

  • Chapter 14, Recognition, from Szeliski's book

  • Li Fei-Fei, Rob Fergus, Antonio Torralba, "ICCV 2009 Short Course: Recognizing and Learning Object Categories." [Course page, including Matlab code]

  • Optional: Bastian Leibe, Edgar Seemann, and Bernt Schiele, "Pedestrian detection in crowded scenes," CVPR 2005, San Diego (June 2005). [PDF]

Scene perception

Recognition of scene categories. Recognition from low-resolution images. Discriminative features for location recognition.
  • S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," IEEE Conference on Computer Vision and Pattern Recognition, New York (June 2006). [PDF]

  • A. Torralba, R. Fergus, W. T. Freeman, "80 million tiny images: a large dataset for non-parametric object and scene recognition," PAMI, 30, 11 (2008). [PDF]

  • Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros, "What Makes Paris Look like Paris?" SIGGRAPH (2012). [PDF] [Project page]

Motion tracking and interpretation

Measuring optical flow. Structure from motion. Kalman filter and estimation theory. Color histograms. Tracking with particle filters. Action recognition.
  • Andrew J. Davison, Ian Reid, Nicholas Molton and Olivier Stasse, "MonoSLAM: Real-Time Single Camera SLAM," IEEE PAMI, (June 2007). [PDF] [Davison's web site]
  • P. Pérez, C. Hue, J. Vermaak and M. Gangnet, "Color-based probabilistic tracking," European Conference on Computer Vision, ECCV 2002, Copenhagen, Denmark (June 2002). [PDF]

  • Alexei A. Efros, Alexander C. Berg, Greg Mori and Jitendra Malik, "Recognizing Action at a Distance," International Conference on Computer Vision, Nice, France (2003). [PDF]

Neurophysiology of vision

Structure of the visual cortex. Higher-level neurophysiology of vision. "What" vs. "where" pathways in the brain. Models of recognition in the brain.
  • Simon A.J. Winder, "A brief survey of central mechanisms in primate visual perception," (2002). [PDF]

  • R. Quiroga, et al., "Invariant visual representation by single neurons in the human brain," Nature (2005). [PDF]

  • T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio, "Object recognition with cortex-like mechanisms," IEEE PAMI (2007). [PDF]

Deep Learning for Vision

The back-propagation algorithm. Convolutional nets. Applications to object category recognition.
  • Yann Lecun, Marc'Aurelio Ranzato, "Deep Learning Tutorial", ICML (2013). [PDF]

  • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," NIPS (2012). [PDF]

  • M.D. Zeiler, R. Fergus, "Visualizing and Understanding Convolutional Networks," arXiv:1311.2901 (November 2013) [PDF]

  • Andrej Karpathy, ConvNetJS: Deep Learning in your browser. Deep learning code that allows for training within your browser using JavaScript.

  • Articles from Wikipedia: The backpropagation algorithm; Deep learning;

Colour vision (optional)

Colour spaces. Colour constancy. The use of colour for recognition.
  • Section 2.3.2, Color, from Szeliski's book

  • Brian Funt, Kobus Barnard and Lindsay Martin, "Is colour constancy good enough?" European Conference on Computer Vision, (1998), pp. 445-459. [PDF]

Project presentations

The final few classes will consist of project presentations.


This page is maintained by David Lowe.