CPSC 532M, Term 2, Winter 2007-2008 Class Home Page

Topics in AI: Dynamic Programming


Contents of this page


Late Breaking News:


Handouts

# Date Topic
1 April 2 Peer evaluation form for project presentations.
1 April 6 Description of the contents of your final project reports.
1 April 2 Sample project presentation.


Homeworks

Homework Submission Policy:

# Due Date Assignment Other Assignment Files Solution Other Solution Files
1 April 11 Problem Set none none Matlab files: solveQ4, which requires discreteLQR and continuousLQR

Course Overview

Dynamic Programming: In many complex systems we have access to a controls, actions or decisions with which we can attempt to improve or optimize the behaviour of that system; for example, in the game of Tetris we seek to rotate and shift (our control) the position of falling pieces to try to minimize the number of holes (our optimization objective) in the rows at the bottom of the board. If those decisions must be made sequentially, we may not be able to anticipate the long-term effect of a decision before the next must be made; in our example, should we use a piece to partially fill a hole even though a piece better suited to that hole might be available shortly?

Dynamic programming (DP) is a very general technique for solving such problems. Unlike many other optimization methods, DP can handle nonlinear, nonconvex and nondeterministic systems, works in both discrete and continuous spaces, and locates the global optimum solution among those available. DP or closely related algorithms have been applied in many fields, and among its instantiations are:

  • Dijkstra's algorithm for shortest path in a graph,
  • A* and branch-and-bound for graph search,
  • LQR for optimal control,
  • Kalman filtering for state estimation,
  • Viterbi algorithm for path estimation in Hidden Markov Models,
  • Optimal stopping for financial portfolio management,
  • Queue scheduling and inventory management,
  • The Hamilton-Jacobi(-Bellman)(-Isaacs) equation.

Approximate Dynamic Programming: Although several of the problems above take special forms, general DP suffers from the "Curse of Dimensionality": the computational complexity grows exponentially with the dimension of the system.

Approximate DP (ADP) algorithms (including "neuro-dynamic programming" and others) are designed to approximate the benefits of DP without paying the computational cost. Among other applications, ADP has been used to play Tetris and to stabilize and fly an autonomous helicopter.

Expectations: In addition to attending lectures, students will:

  • Read and discuss academic papers.
  • Complete several homework assignments involving both paper and pencil and programming components.
  • Lead class discussions on topics from course notes and/or research papers.
  • Write a midterm exam.
  • Complete a project involving DP or ADP. The course project will include a proposal, a presentation and a final report.




Course Details

Intended Audience: Graduate students in

Prerequisites: None officially.

Computer Science Breadth: This course does not count toward the computer science graduate breadth requirement.

Instructor: Ian Mitchell

Lectures: 3:30 - 5:00, Mondays and Wednesdays, ICICS/CS 238

Grades: Your final grade will be based on a combination of

The relative weighting will depend on the precise number of homework assignments and/or class discussions.

Text References: Some of these are available from the library or reading room. All can be borrowed temporarily from me.

Online References:


Lecture Plan:

In the first few lectures I will cover the basic concepts of DP: formulating the system model and optimization criterion, the value function and Dynamic Programming Principle (DPP), policies and feedback control, shortest path algorithms, and basic value and policy iterations.

After these lectures, we will run the course more like a reading group. In consultation with me, students may choose topics for which there are suitable notes and/or research papers, the class will read the material, and then the student will lead a discussion. Some of these topics are large, so students can choose some suitable subset on which to lead a discussion.

Topics that we will definitely cover (eg: I will lead the discussion if nobody else wants to):

Topics that we will cover if somebody volunteers (eg: I already know of suitable reading material):

Students are welcome to propose other topics, but may have to identify suitable reading material before they are included in the schedule.

Actual Schedule:

Reading Sources
Abbreviation Source
BS Bertsekas Slides
BB Dynamic Programming & Optimal Control by Bertsekas (Table of Contents)
BT Neuro-Dynamic Programming by Bertsekas and Tsitsiklis (Table of Contents)
CLRS Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein (Table of Contents)
DF De Farias Notes
HLADP Handbook of Learning and Approximate Dynamic Programming edited by Si, Barto, Powell and Wunsch (Table of Contents)
VR Van Roy Notes

# Date Topic Links Required Readings Optional Readings
1 Jan 9 Introduction. System representation. none. VR lecture 1, sections 1-3. BS lecture 1; DF lecture 1, sections 1, 3; BB 1.1, 1.4
2 Jan 14 We will adjorn the class to the IAM Distinguished Lecture: Prof. Richard Baraniuk, speaking on Compressive Signal Processing. Note special time & location: LSK 301, 3pm - 4pm. This topic is not directly related to dynamic programming, but maybe it could be...
3 Jan 16 Optimality criteria (finite horizon, discounting). Value function. Dynamic programming principle. Feedback policies. none. BS lecture 2; VR lecture 1, section 4, lecture 2, sections 1-2. DF lecture 1, sections 2, 4, 5; BB 1.2-1.3; BT 2.1.0-2.1.1
4 Jan 21 Transforming finite DP into graph shortest path. none. BS lecture 3. BB 2; BT 2.2.1
5 Jan 23 DP for solving graph shortest path: basic label correcting algorithm. none. BS lecture 4. BB 2; BT 2.2.1; CLRS 25
6 Jan 28 Various examples of label correcting algorithms. Efficiency improvements. Infinite horizon problems. none. BS lecture 4; DF lecture 1, section 5; VR lecture 2, section 2. BS lecture 17, lecture 18; BB 2; BT 2.2.1; CLRS 25
7 Jan 30 Value iteration. Policy iteration. Value iteration applet from David Poole's Interactive Demos. DF lecture 1, section 6, lecture 2, lecture 3; VR lecture 2, sections 3-4, lecture 3. BT 2.2.2-2.2.3, 3, 5-6; BB 6.3-6.5.
8 Feb 4 Neuro-dynamic programming overview. Discrete time Linear Quadratic Regulator (LQR) optimal control. none. D. P. Bertsekas "Neuro-dynamic Programming", Encyclopedia of Optimization (Kluwer, 2001); D. P. Bertsekas "Neuro-dynamic Programming: an Overview" slides; Stephen Boyd's notes on discrete time LQR; BS lecture 5. D. P. Bertsekas "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC" European J. Control, v. 11, n. 4-5 (2005). BB 4.1.
9 Feb 6 Infinite horizon and continuous time LQR optimal control. none. Stephen Boyd's notes on infinite horizon LQR and continuous time LQR. BB 4.1.
10 Feb 11 We will adjorn the class to the IAM Seminar: Prof. Michael Friedlander, speaking on Algorithms for Large-Scale Sparse Reconstruction. Note special time & location: LSK 301, 3pm - 4pm. This talk will describe how to solve the problems that were posed in Baraniuk's talk from January 14.
11 Feb 13 TD Learning (Jacek Kisynski). none. DF lecture 13 and lecture 14. VR lecture 9 and lecture 10, sections 1-3. BS lecture 23.
Feb 18-22 Midterm Break (no class)
12 Feb 25 Kalman filters (M. Emtiyaz Khan). Emtiyaz's Kalman filter demo: KFdemo.m and KF.m Emtiyaz's slides. Stephen Boyd's notes on Kalman filters. BS lecture 11. A short note by Emtiyaz on information filters.
13 Feb 27 Hamilton-Jacobi equation for nonlinear optimal control (Ivan Sham). Optimal Stopping (Amit Goyal). none. HJ: Ivan's slides. Optimal Stopping: Amit's slides. HJ: BS lecture 7. Optimal Stopping: BS lecture 6; DF lecture 15; VR lecture 10, section 4.
14 March 3 Eikonal equation for continuous shortest path (Josna Rao). Q-factors and Q-learning (Stephen Pickett). Jamie Sethian's web page has lots of Fast Marching related material, including an applet demonstrating a continuous version of the travelling salesman problem. Eikonal equation: Josna's slides. Q-learning: Stephen's notes. Q-learning: BS lecture 24; DF lecture 7, section 2 and lecture 8, section 2; VR lecture 5, lecture 6 and lecture 7.
15 March 5 Value function approximation with Linear Programming (Jonatan Schroeder). Some of David Poole's interactive applets (Jacek Kisynski). David Poole's Interactive Demos. Linear Programming: Jonatan's slides. none.
16 March 10 Value function approximation with neural networks (Mark Schmidt). Differential dynamic programming (Sang Hoon Yeo). Function approximation: Mark's Matlab code. May require minFunc. Function approximation: Mark's slides. DDP: Sang Hoon's slides. DDP: "Random Sampling of States in Dynamic Programming", Christopher G. Atkeson & Benjamin Stephens, NIPS 2007.
17 March 12 Policy search / reinforcement learning method PEGASUS for helicopter control (Ken Alton). Schemes for solving stationary Hamilton-Jacobi PDEs: Fast Marching, sweeping, transformation to time-dependent form. autonomous helicopter website (including videos). Pegasus: Ken's slides. none.
18 March 17 ADP in sensor networks (Jonatan Schroeder) and LiveVessel (Josna Rao) none. Jonatan's slides. Josna's slides. Sensor Networks: Jason L. Williams, John W. Fisher III, Alan S. Willsky, "Approximate Dynamic Programming for Communication-Constrained Sensor Network Management," IEEE Trans. Signal Processing, v. 55, n. 8, pp. 4300-4311 (August 2007). LiveWire: William A. Barrett & Eric. N. Mortensen, "Interactive Live-Wire Boundary Extraction," Medical Image Analysis, v. 1, n. 4, pp. 331-341 (Sept 1997). LiveVessel: Kelvin Poon, Ghassan Hamarneh & Rafeef Abugharbieh, "Live-Vessel: Extending Livewire for Simultaneous Extraction of Optimal Medial and Boundary Paths in Vascular Images," Medical Image Computing and Computer-Assisted Intervention (MICCAI), LNCS 4792, pp. 444-451 (2007).
19 March 19 ADP for Tetris (Ivan Sham) and ADP with Diffusion Wavelets and Laplacian Eigenfunctions (Ian). singular value decomposition (SVD) based image compression demo. Ivan's slides. Vivek F. Farias & Benjamin Van Roy, "Tetris: A Study of Randomized Constraint Sampling," Probabilistic and Randomized Methods for Design Under Uncertainty (Calafiore & Dabbene eds.) Springer-Verlag (2006). Dimitri P. Bertsekas & Sergey Ioffe, "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Report LIDS-P-2349, MIT (1996).. Daniela de Farias & Benjamin Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, v. 51, n. 6, pp. 850-856 (2003). Sridhar Mahadevan & Mauro Maggioni, "Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions," Neural Information Processing Systems (NIPS), MIT Press (2006).
March 24 Easter Monday (no class)
March 26 Class cancelled. Work on your projects.
20 March 31 Rating game players with DP (Stephen Pickett) and Hierarchical discretization with DP (Amit Goyal). Mark Glickman's research web page. Amit's slides. Mark Glickman, "Paired Comparison Models with Time-Varying Parameters", Harvard Dept. Statistics Ph.D. thesis (1993). Ching-Cheng Shen & Yen-Liang Chen, "A Dynamic Programming Algorithm for Hierarchical Discretization of Continuous Attributes," European J. Operational Research, v. 184, n. 2, pp. 636-651 (January 2008).
21 April 2 DP-like Suboptimal Control: Certainty Equivalent Control (CEC), Open-Loop Feedback Control (OLFC), limited lookahead. none. BS lecture 13 (CEC) and lecture 14 (limited lookahead). BB 6.0-6.3
23 April 7 DP-like Suboptimal Control: Rollout, model predictive control and receding horizon. Example project presentation. BS lecture 15 (rollout) BB 6.4-6.5
24 April 9 Project Presentations (peer evaluation form)
April 15-29 Final Exam Period. Projects due 3pm Friday April 25.


Links:


CPSC 532M Term 1 Winter 2007-2008 Class Page
maintained by Ian Mitchell