CPSC 532M, Term 2, Winter 2007-2008 Class Home Page

Topics in AI: Dynamic Programming

Contents of this page

Late breaking news
Things to print
- Handouts
- Homeworks
Course Overview
Course Details
Tentative Schedule
Course Projects
Related Links

Late Breaking News:

2008/05/04: Final grades have been submitted. I need to keep your final reports, but you are welcome to come by my office to pick up your homeworks and discuss your projects (and make a copy if you wish).
2008/05/04: Matlab files solving question 4 from the homework have been posted in the Homeworks section.
2008/04/06: A example project presentation and a description of your project report has been posted in the handouts section.
2008/04/02: A peer review sheet has been posted for the project presentations. Take a look at it to see what you will be expected to include in your presentation.
2008/03/03: The long promised homework 1 has been posted. Let me know if you find any bugs.
2008/02/19: I had promised an assignment, but I leant both of my copies of Bertsekas' optimal control book, so I cannot look for reasonable problems. I will get something out after the midterm break. In the mean time, please get me your rough project idea emails.
2008/01/29: The Tentative Schedule has been updated.
2008/01/14: Today's class is adjourned to the IAM distinguished lecture, 3pm at LSK 301. More details in the tenative schedule.
2008/01/09: I changed my mind. The first lecture will be Wednesday January 9.

Handouts

#	Date	Topic
1	April 2	Peer evaluation form for project presentations.
1	April 6	Description of the contents of your final project reports.
1	April 2	Sample project presentation.

Homeworks

Homework Submission Policy:

Get it in soon or I can't release solutions.
Get it in by the end of the semester, or you won't get a grade.

#	Due Date	Assignment	Other Assignment Files	Solution	Other Solution Files
1	April 11	Problem Set	none	none	Matlab files: solveQ4, which requires discreteLQR and continuousLQR

Course Overview

Dynamic Programming: In many complex systems we have access to a controls, actions or decisions with which we can attempt to improve or optimize the behaviour of that system; for example, in the game of Tetris we seek to rotate and shift (our control) the position of falling pieces to try to minimize the number of holes (our optimization objective) in the rows at the bottom of the board. If those decisions must be made sequentially, we may not be able to anticipate the long-term effect of a decision before the next must be made; in our example, should we use a piece to partially fill a hole even though a piece better suited to that hole might be available shortly?

Dynamic programming (DP) is a very general technique for solving such problems. Unlike many other optimization methods, DP can handle nonlinear, nonconvex and nondeterministic systems, works in both discrete and continuous spaces, and locates the global optimum solution among those available. DP or closely related algorithms have been applied in many fields, and among its instantiations are:

Dijkstra's algorithm for shortest path in a graph,
A* and branch-and-bound for graph search,
LQR for optimal control,
Kalman filtering for state estimation,
Viterbi algorithm for path estimation in Hidden Markov Models,
Optimal stopping for financial portfolio management,
Queue scheduling and inventory management,
The Hamilton-Jacobi(-Bellman)(-Isaacs) equation.

Approximate Dynamic Programming: Although several of the problems above take special forms, general DP suffers from the "Curse of Dimensionality": the computational complexity grows exponentially with the dimension of the system.

Approximate DP (ADP) algorithms (including "neuro-dynamic programming" and others) are designed to approximate the benefits of DP without paying the computational cost. Among other applications, ADP has been used to play Tetris and to stabilize and fly an autonomous helicopter.

Expectations: In addition to attending lectures, students will:

Read and discuss academic papers.
Complete several homework assignments involving both paper and pencil and programming components.
Lead class discussions on topics from course notes and/or research papers.
Write a midterm exam.
Complete a project involving DP or ADP. The course project will include a proposal, a presentation and a final report.

Course Details

Intended Audience: Graduate students in

Computer Science,
Applied Mathematics,
Engineering and other application fields.

Prerequisites: None officially.

Students should be comfortable with basic probability and linear algebra, and should have seen difference equations (such as Markov Decision Processes), differential equations (ODEs), multivariable calculus and introductory numerical methods.
Course projects may be programmed in the language of the student's choosing, although programming is not a required component of projects.
If you are in doubt, come to the first class or see me.

Computer Science Breadth: This course does not count toward the computer science graduate breadth requirement.

Instructor: Ian Mitchell

Email: mitchell (at) cs (dot) ubc (dot) ca
Office Location: ICICS/CS 217
Office Phone: (604) 822-2317
Office Hours: TBA (and by appointment)

Lectures: 3:30 - 5:00, Mondays and Wednesdays, ICICS/CS 238

Location is subject to change; check here or the CS Graduate Course web page or UBC Course Web Site for updates.
There are no lectures Monday February 18 to Friday February 22 (Midterm break).
There is no lecture Monday March 24 (Easter Monday).
There are no scheduled labs or tutorials for this course.

Grades: Your final grade will be based on a combination of

3-5 homework assignments and/or leading a class discussion
1 midterm exam
1 project presentation
1 project report

The relative weighting will depend on the precise number of homework assignments and/or class discussions.

Text References: Some of these are available from the library or reading room. All can be borrowed temporarily from me.

Dynamic Programming by Bellman (Table of Contents).
Dynamic Programming & Optimal Control by Bertsekas (Table of Contents).
Neuro-Dynamic Programming by Bertsekas and Tsitsiklis (Table of Contents).
Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein (Table of Contents).
Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization by Isaacs (Table of Contents).
Handbook of Learning and Approximate Dynamic Programming edited by Si, Barto, Powell and Wunsch (Table of Contents).

Online References:

Wikipedia entry on Dynamic Programming.
MIT OpenCourseWare 2.997: Decision Making in Large Scale Systems taught by Daniela Pucci De Farias.
MIT OpenCourseWare 6.231: Dynamic Programming and Stochastic Control taught by Dimitri Bertsekas.
Stanford MS&E 339: Approximate Dynamic Programming taught by Ben Van Roy.
Stanford CS 229: Machine Learning taught by Andrew Ng. This course is primarily machine learning, but the final major topic (Reinforcement Learning and Control) has a DP connection.
INSEAD Dynamic Programming via Applications taught by Ioana Popescu. Business oriented. Check out the "Fun Applets" at the bottom of the Handouts page.

Lecture Plan:

In the first few lectures I will cover the basic concepts of DP: formulating the system model and optimization criterion, the value function and Dynamic Programming Principle (DPP), policies and feedback control, shortest path algorithms, and basic value and policy iterations.

After these lectures, we will run the course more like a reading group. In consultation with me, students may choose topics for which there are suitable notes and/or research papers, the class will read the material, and then the student will lead a discussion. Some of these topics are large, so students can choose some suitable subset on which to lead a discussion.

Topics that we will definitely cover (eg: I will lead the discussion if nobody else wants to):

Q-learning and Temporal-Difference Learning.
Rollout, limited lookahead and model predictive control.
Optimal control in continuous time and space.
LQR for linear optimal control.
Kalman filters for linear state estimation.
The Pontryagin minimum principle.
Eikonal equation for shortest path in continuous state space and the Fast Marching Method for solving it.

Topics that we will cover if somebody volunteers (eg: I already know of suitable reading material):

DP for financial portfolio selection and optimal stopping for pricing derivatives.
Exploration vs exploitation in learning.
Approximate linear programming and Tetris.
Constraint sampling and/or factored MDPs for approximate linear programming.
Policy search method PEGASUS, reinforcement learning and helicopter control.
Neural networks and/or SVMs for value function approximation.
Lyapunov functions for proving convergence.
Viterbi algorithm for decoding, speech recognition, bioinformatics, etc.
Extended and/or unscented Kalman filters and the information filter.

Students are welcome to propose other topics, but may have to identify suitable reading material before they are included in the schedule.

Actual Schedule:

I will fill in this table as we progress through the term.
Topics of future lectures are subject to change.
Some readings and/or links may not be operational from computers outside the UBC domain. If you have problems, please contact the instructor.

Reading Sources
Abbreviation	Source
BS	Bertsekas Slides
BB	Dynamic Programming & Optimal Control by Bertsekas (Table of Contents)
BT	Neuro-Dynamic Programming by Bertsekas and Tsitsiklis (Table of Contents)
CLRS	Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein (Table of Contents)
DF	De Farias Notes
HLADP	Handbook of Learning and Approximate Dynamic Programming edited by Si, Barto, Powell and Wunsch (Table of Contents)
VR	Van Roy Notes

# Date Topic Links Required Readings Optional Readings

1 Jan 9 Introduction. System representation. none. VR lecture 1, sections 1-3. BS lecture 1; DF lecture 1, sections 1, 3; BB 1.1, 1.4

2 Jan 14 We will adjorn the class to the IAM Distinguished Lecture: Prof. Richard Baraniuk, speaking on Compressive Signal Processing. Note special time & location: LSK 301, 3pm - 4pm. This topic is not directly related to dynamic programming, but maybe it could be...

3 Jan 16 Optimality criteria (finite horizon, discounting). Value function. Dynamic programming principle. Feedback policies. none. BS lecture 2; VR lecture 1, section 4, lecture 2, sections 1-2. DF lecture 1, sections 2, 4, 5; BB 1.2-1.3; BT 2.1.0-2.1.1

4 Jan 21 Transforming finite DP into graph shortest path. none. BS lecture 3. BB 2; BT 2.2.1

5 Jan 23 DP for solving graph shortest path: basic label correcting algorithm. none. BS lecture 4. BB 2; BT 2.2.1; CLRS 25

6 Jan 28 Various examples of label correcting algorithms. Efficiency improvements. Infinite horizon problems. none. BS lecture 4; DF lecture 1, section 5; VR lecture 2, section 2. BS lecture 17, lecture 18; BB 2; BT 2.2.1; CLRS 25

7 Jan 30 Value iteration. Policy iteration. Value iteration applet from David Poole's Interactive Demos. DF lecture 1, section 6, lecture 2, lecture 3; VR lecture 2, sections 3-4, lecture 3. BT 2.2.2-2.2.3, 3, 5-6; BB 6.3-6.5.

8 Feb 4 Neuro-dynamic programming overview. Discrete time Linear Quadratic Regulator (LQR) optimal control. none. D. P. Bertsekas "Neuro-dynamic Programming", Encyclopedia of Optimization (Kluwer, 2001); D. P. Bertsekas "Neuro-dynamic Programming: an Overview" slides; Stephen Boyd's notes on discrete time LQR; BS lecture 5. D. P. Bertsekas "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC" European J. Control, v. 11, n. 4-5 (2005). BB 4.1.

9 Feb 6 Infinite horizon and continuous time LQR optimal control. none. Stephen Boyd's notes on infinite horizon LQR and continuous time LQR. BB 4.1.

10 Feb 11 We will adjorn the class to the IAM Seminar: Prof. Michael Friedlander, speaking on Algorithms for Large-Scale Sparse Reconstruction. Note special time & location: LSK 301, 3pm - 4pm. This talk will describe how to solve the problems that were posed in Baraniuk's talk from January 14.

11 Feb 13 TD Learning (Jacek Kisynski). none. DF lecture 13 and lecture 14. VR lecture 9 and lecture 10, sections 1-3. BS lecture 23.

Feb 18-22 Midterm Break (no class)

12 Feb 25 Kalman filters (M. Emtiyaz Khan). Emtiyaz's Kalman filter demo: KFdemo.m and KF.m Emtiyaz's slides. Stephen Boyd's notes on Kalman filters. BS lecture 11. A short note by Emtiyaz on information filters.

13 Feb 27 Hamilton-Jacobi equation for nonlinear optimal control (Ivan Sham). Optimal Stopping (Amit Goyal). none. HJ: Ivan's slides. Optimal Stopping: Amit's slides. HJ: BS lecture 7. Optimal Stopping: BS lecture 6; DF lecture 15; VR lecture 10, section 4.

14 March 3 Eikonal equation for continuous shortest path (Josna Rao). Q-factors and Q-learning (Stephen Pickett). Jamie Sethian's web page has lots of Fast Marching related material, including an applet demonstrating a continuous version of the travelling salesman problem. Eikonal equation: Josna's slides. Q-learning: Stephen's notes. Q-learning: BS lecture 24; DF lecture 7, section 2 and lecture 8, section 2; VR lecture 5, lecture 6 and lecture 7.

15 March 5 Value function approximation with Linear Programming (Jonatan Schroeder). Some of David Poole's interactive applets (Jacek Kisynski). David Poole's Interactive Demos. Linear Programming: Jonatan's slides. none.

16 March 10 Value function approximation with neural networks (Mark Schmidt). Differential dynamic programming (Sang Hoon Yeo). Function approximation: Mark's Matlab code. May require minFunc. Function approximation: Mark's slides. DDP: Sang Hoon's slides. DDP: "Random Sampling of States in Dynamic Programming", Christopher G. Atkeson & Benjamin Stephens, NIPS 2007.

17 March 12 Policy search / reinforcement learning method PEGASUS for helicopter control (Ken Alton). Schemes for solving stationary Hamilton-Jacobi PDEs: Fast Marching, sweeping, transformation to time-dependent form. autonomous helicopter website (including videos). Pegasus: Ken's slides. none.

18 March 17 ADP in sensor networks (Jonatan Schroeder) and LiveVessel (Josna Rao) none. Jonatan's slides. Josna's slides. Sensor Networks: Jason L. Williams, John W. Fisher III, Alan S. Willsky, "Approximate Dynamic Programming for Communication-Constrained Sensor Network Management," IEEE Trans. Signal Processing, v. 55, n. 8, pp. 4300-4311 (August 2007). LiveWire: William A. Barrett & Eric. N. Mortensen, "Interactive Live-Wire Boundary Extraction," Medical Image Analysis, v. 1, n. 4, pp. 331-341 (Sept 1997). LiveVessel: Kelvin Poon, Ghassan Hamarneh & Rafeef Abugharbieh, "Live-Vessel: Extending Livewire for Simultaneous Extraction of Optimal Medial and Boundary Paths in Vascular Images," Medical Image Computing and Computer-Assisted Intervention (MICCAI), LNCS 4792, pp. 444-451 (2007).

19 March 19 ADP for Tetris (Ivan Sham) and ADP with Diffusion Wavelets and Laplacian Eigenfunctions (Ian). singular value decomposition (SVD) based image compression demo. Ivan's slides. Vivek F. Farias & Benjamin Van Roy, "Tetris: A Study of Randomized Constraint Sampling," Probabilistic and Randomized Methods for Design Under Uncertainty (Calafiore & Dabbene eds.) Springer-Verlag (2006). Dimitri P. Bertsekas & Sergey Ioffe, "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Report LIDS-P-2349, MIT (1996).. Daniela de Farias & Benjamin Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, v. 51, n. 6, pp. 850-856 (2003). Sridhar Mahadevan & Mauro Maggioni, "Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions," Neural Information Processing Systems (NIPS), MIT Press (2006).

March 24 Easter Monday (no class)

March 26 Class cancelled. Work on your projects.

20 March 31 Rating game players with DP (Stephen Pickett) and Hierarchical discretization with DP (Amit Goyal). Mark Glickman's research web page. Amit's slides. Mark Glickman, "Paired Comparison Models with Time-Varying Parameters", Harvard Dept. Statistics Ph.D. thesis (1993). Ching-Cheng Shen & Yen-Liang Chen, "A Dynamic Programming Algorithm for Hierarchical Discretization of Continuous Attributes," European J. Operational Research, v. 184, n. 2, pp. 636-651 (January 2008).

21 April 2 DP-like Suboptimal Control: Certainty Equivalent Control (CEC), Open-Loop Feedback Control (OLFC), limited lookahead. none. BS lecture 13 (CEC) and lecture 14 (limited lookahead). BB 6.0-6.3

23 April 7 DP-like Suboptimal Control: Rollout, model predictive control and receding horizon. Example project presentation. BS lecture 15 (rollout) BB 6.4-6.5

24 April 9 Project Presentations (peer evaluation form)

April 15-29 Final Exam Period. Projects due 3pm Friday April 25.

#	Date	Topic	Links	Required Readings	Optional Readings
1	Jan 9	Introduction. System representation.	none.	VR lecture 1, sections 1-3.	BS lecture 1; DF lecture 1, sections 1, 3; BB 1.1, 1.4
2	Jan 14	We will adjorn the class to the IAM Distinguished Lecture: Prof. Richard Baraniuk, speaking on Compressive Signal Processing. Note special time & location: LSK 301, 3pm - 4pm. This topic is not directly related to dynamic programming, but maybe it could be...
3	Jan 16	Optimality criteria (finite horizon, discounting). Value function. Dynamic programming principle. Feedback policies.	none.	BS lecture 2; VR lecture 1, section 4, lecture 2, sections 1-2.	DF lecture 1, sections 2, 4, 5; BB 1.2-1.3; BT 2.1.0-2.1.1
4	Jan 21	Transforming finite DP into graph shortest path.	none.	BS lecture 3.	BB 2; BT 2.2.1
5	Jan 23	DP for solving graph shortest path: basic label correcting algorithm.	none.	BS lecture 4.	BB 2; BT 2.2.1; CLRS 25
6	Jan 28	Various examples of label correcting algorithms. Efficiency improvements. Infinite horizon problems.	none.	BS lecture 4; DF lecture 1, section 5; VR lecture 2, section 2.	BS lecture 17, lecture 18; BB 2; BT 2.2.1; CLRS 25
7	Jan 30	Value iteration. Policy iteration.	Value iteration applet from David Poole's Interactive Demos.	DF lecture 1, section 6, lecture 2, lecture 3; VR lecture 2, sections 3-4, lecture 3.	BT 2.2.2-2.2.3, 3, 5-6; BB 6.3-6.5.
8	Feb 4	Neuro-dynamic programming overview. Discrete time Linear Quadratic Regulator (LQR) optimal control.	none.	D. P. Bertsekas "Neuro-dynamic Programming", Encyclopedia of Optimization (Kluwer, 2001); D. P. Bertsekas "Neuro-dynamic Programming: an Overview" slides; Stephen Boyd's notes on discrete time LQR; BS lecture 5.	D. P. Bertsekas "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC" European J. Control, v. 11, n. 4-5 (2005). BB 4.1.
9	Feb 6	Infinite horizon and continuous time LQR optimal control.	none.	Stephen Boyd's notes on infinite horizon LQR and continuous time LQR.	BB 4.1.
10	Feb 11	We will adjorn the class to the IAM Seminar: Prof. Michael Friedlander, speaking on Algorithms for Large-Scale Sparse Reconstruction. Note special time & location: LSK 301, 3pm - 4pm. This talk will describe how to solve the problems that were posed in Baraniuk's talk from January 14.
11	Feb 13	TD Learning (Jacek Kisynski).	none.	DF lecture 13 and lecture 14.	VR lecture 9 and lecture 10, sections 1-3. BS lecture 23.
	Feb 18-22	Midterm Break (no class)
12	Feb 25	Kalman filters (M. Emtiyaz Khan).	Emtiyaz's Kalman filter demo: KFdemo.m and KF.m	Emtiyaz's slides. Stephen Boyd's notes on Kalman filters.	BS lecture 11. A short note by Emtiyaz on information filters.
13	Feb 27	Hamilton-Jacobi equation for nonlinear optimal control (Ivan Sham). Optimal Stopping (Amit Goyal).	none.	HJ: Ivan's slides. Optimal Stopping: Amit's slides.	HJ: BS lecture 7. Optimal Stopping: BS lecture 6; DF lecture 15; VR lecture 10, section 4.
14	March 3	Eikonal equation for continuous shortest path (Josna Rao). Q-factors and Q-learning (Stephen Pickett).	Jamie Sethian's web page has lots of Fast Marching related material, including an applet demonstrating a continuous version of the travelling salesman problem.	Eikonal equation: Josna's slides. Q-learning: Stephen's notes.	Q-learning: BS lecture 24; DF lecture 7, section 2 and lecture 8, section 2; VR lecture 5, lecture 6 and lecture 7.
15	March 5	Value function approximation with Linear Programming (Jonatan Schroeder). Some of David Poole's interactive applets (Jacek Kisynski).	David Poole's Interactive Demos.	Linear Programming: Jonatan's slides.	none.
16	March 10	Value function approximation with neural networks (Mark Schmidt). Differential dynamic programming (Sang Hoon Yeo).	Function approximation: Mark's Matlab code. May require minFunc.	Function approximation: Mark's slides. DDP: Sang Hoon's slides.	DDP: "Random Sampling of States in Dynamic Programming", Christopher G. Atkeson & Benjamin Stephens, NIPS 2007.
17	March 12	Policy search / reinforcement learning method PEGASUS for helicopter control (Ken Alton). Schemes for solving stationary Hamilton-Jacobi PDEs: Fast Marching, sweeping, transformation to time-dependent form.	autonomous helicopter website (including videos).	Pegasus: Ken's slides.	none.
18	March 17	ADP in sensor networks (Jonatan Schroeder) and LiveVessel (Josna Rao)	none.	Jonatan's slides. Josna's slides.	Sensor Networks: Jason L. Williams, John W. Fisher III, Alan S. Willsky, "Approximate Dynamic Programming for Communication-Constrained Sensor Network Management," IEEE Trans. Signal Processing, v. 55, n. 8, pp. 4300-4311 (August 2007). LiveWire: William A. Barrett & Eric. N. Mortensen, "Interactive Live-Wire Boundary Extraction," Medical Image Analysis, v. 1, n. 4, pp. 331-341 (Sept 1997). LiveVessel: Kelvin Poon, Ghassan Hamarneh & Rafeef Abugharbieh, "Live-Vessel: Extending Livewire for Simultaneous Extraction of Optimal Medial and Boundary Paths in Vascular Images," Medical Image Computing and Computer-Assisted Intervention (MICCAI), LNCS 4792, pp. 444-451 (2007).
19	March 19	ADP for Tetris (Ivan Sham) and ADP with Diffusion Wavelets and Laplacian Eigenfunctions (Ian).	singular value decomposition (SVD) based image compression demo.	Ivan's slides.	Vivek F. Farias & Benjamin Van Roy, "Tetris: A Study of Randomized Constraint Sampling," Probabilistic and Randomized Methods for Design Under Uncertainty (Calafiore & Dabbene eds.) Springer-Verlag (2006). Dimitri P. Bertsekas & Sergey Ioffe, "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Report LIDS-P-2349, MIT (1996).. Daniela de Farias & Benjamin Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, v. 51, n. 6, pp. 850-856 (2003). Sridhar Mahadevan & Mauro Maggioni, "Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions," Neural Information Processing Systems (NIPS), MIT Press (2006).
	March 24	Easter Monday (no class)
	March 26	Class cancelled. Work on your projects.
20	March 31	Rating game players with DP (Stephen Pickett) and Hierarchical discretization with DP (Amit Goyal).	Mark Glickman's research web page.	Amit's slides.	Mark Glickman, "Paired Comparison Models with Time-Varying Parameters", Harvard Dept. Statistics Ph.D. thesis (1993). Ching-Cheng Shen & Yen-Liang Chen, "A Dynamic Programming Algorithm for Hierarchical Discretization of Continuous Attributes," European J. Operational Research, v. 184, n. 2, pp. 636-651 (January 2008).
21	April 2	DP-like Suboptimal Control: Certainty Equivalent Control (CEC), Open-Loop Feedback Control (OLFC), limited lookahead.	none.	BS lecture 13 (CEC) and lecture 14 (limited lookahead).	BB 6.0-6.3
23	April 7	DP-like Suboptimal Control: Rollout, model predictive control and receding horizon.	Example project presentation.	BS lecture 15 (rollout)	BB 6.4-6.5
24	April 9	Project Presentations (peer evaluation form)
	April 15-29	Final Exam Period. Projects due 3pm Friday April 25.