Contents of this page 
#  Date  Topic 

1  April 2  Peer evaluation form for project presentations. 
1  April 6  Description of the contents of your final project reports. 
1  April 2  Sample project presentation. 
Homework Submission Policy:
#  Due Date  Assignment  Other Assignment Files  Solution  Other Solution Files 

1  April 11  Problem Set  none  none  Matlab files: solveQ4, which requires discreteLQR and continuousLQR 
Course OverviewDynamic Programming: In many complex systems we have access to a controls, actions or decisions with which we can attempt to improve or optimize the behaviour of that system; for example, in the game of Tetris we seek to rotate and shift (our control) the position of falling pieces to try to minimize the number of holes (our optimization objective) in the rows at the bottom of the board. If those decisions must be made sequentially, we may not be able to anticipate the longterm effect of a decision before the next must be made; in our example, should we use a piece to partially fill a hole even though a piece better suited to that hole might be available shortly? Dynamic programming (DP) is a very general technique for solving such problems. Unlike many other optimization methods, DP can handle nonlinear, nonconvex and nondeterministic systems, works in both discrete and continuous spaces, and locates the global optimum solution among those available. DP or closely related algorithms have been applied in many fields, and among its instantiations are:
Approximate Dynamic Programming: Although several of the problems above take special forms, general DP suffers from the "Curse of Dimensionality": the computational complexity grows exponentially with the dimension of the system. Approximate DP (ADP) algorithms (including "neurodynamic programming" and others) are designed to approximate the benefits of DP without paying the computational cost. Among other applications, ADP has been used to play Tetris and to stabilize and fly an autonomous helicopter. Expectations: In addition to attending lectures, students will:

Intended Audience: Graduate students in
Prerequisites: None officially.
Computer Science Breadth: This course does not count toward the computer science graduate breadth requirement.
Instructor: Ian Mitchell
Lectures: 3:30  5:00, Mondays and Wednesdays, ICICS/CS 238
Grades: Your final grade will be based on a combination of
Text References: Some of these are available from the library or reading room. All can be borrowed temporarily from me.
Online References:
In the first few lectures I will cover the basic concepts of DP: formulating the system model and optimization criterion, the value function and Dynamic Programming Principle (DPP), policies and feedback control, shortest path algorithms, and basic value and policy iterations.
After these lectures, we will run the course more like a reading group. In consultation with me, students may choose topics for which there are suitable notes and/or research papers, the class will read the material, and then the student will lead a discussion. Some of these topics are large, so students can choose some suitable subset on which to lead a discussion.
Topics that we will definitely cover (eg: I will lead the discussion if nobody else wants to):
Topics that we will cover if somebody volunteers (eg: I already know of suitable reading material):
Students are welcome to propose other topics, but may have to identify suitable reading material before they are included in the schedule.
Reading Sources  

Abbreviation  Source 
BS  Bertsekas Slides 
BB  Dynamic Programming & Optimal Control by Bertsekas (Table of Contents) 
BT  NeuroDynamic Programming by Bertsekas and Tsitsiklis (Table of Contents) 
CLRS  Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein (Table of Contents) 
DF  De Farias Notes 
HLADP  Handbook of Learning and Approximate Dynamic Programming edited by Si, Barto, Powell and Wunsch (Table of Contents) 
VR  Van Roy Notes 
#  Date  Topic  Links  Required Readings  Optional Readings 

1  Jan 9  Introduction. System representation.  none.  VR lecture 1, sections 13.  BS lecture 1; DF lecture 1, sections 1, 3; BB 1.1, 1.4 
2  Jan 14  We will adjorn the class to the IAM Distinguished Lecture: Prof. Richard Baraniuk, speaking on Compressive Signal Processing. Note special time & location: LSK 301, 3pm  4pm. This topic is not directly related to dynamic programming, but maybe it could be...  
3  Jan 16  Optimality criteria (finite horizon, discounting). Value function. Dynamic programming principle. Feedback policies.  none.  BS lecture 2; VR lecture 1, section 4, lecture 2, sections 12.  DF lecture 1, sections 2, 4, 5; BB 1.21.3; BT 2.1.02.1.1 
4  Jan 21  Transforming finite DP into graph shortest path.  none.  BS lecture 3.  BB 2; BT 2.2.1 
5  Jan 23  DP for solving graph shortest path: basic label correcting algorithm.  none.  BS lecture 4.  BB 2; BT 2.2.1; CLRS 25 
6  Jan 28  Various examples of label correcting algorithms. Efficiency improvements. Infinite horizon problems.  none.  BS lecture 4; DF lecture 1, section 5; VR lecture 2, section 2.  BS lecture 17, lecture 18; BB 2; BT 2.2.1; CLRS 25 
7  Jan 30  Value iteration. Policy iteration.  Value iteration applet from David Poole's Interactive Demos.  DF lecture 1, section 6, lecture 2, lecture 3; VR lecture 2, sections 34, lecture 3.  BT 2.2.22.2.3, 3, 56; BB 6.36.5. 
8  Feb 4  Neurodynamic programming overview. Discrete time Linear Quadratic Regulator (LQR) optimal control.  none.  D. P. Bertsekas "Neurodynamic Programming", Encyclopedia of Optimization (Kluwer, 2001); D. P. Bertsekas "Neurodynamic Programming: an Overview" slides; Stephen Boyd's notes on discrete time LQR; BS lecture 5.  D. P. Bertsekas "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC" European J. Control, v. 11, n. 45 (2005). BB 4.1. 
9  Feb 6  Infinite horizon and continuous time LQR optimal control.  none.  Stephen Boyd's notes on infinite horizon LQR and continuous time LQR.  BB 4.1. 
10  Feb 11  We will adjorn the class to the IAM Seminar: Prof. Michael Friedlander, speaking on Algorithms for LargeScale Sparse Reconstruction. Note special time & location: LSK 301, 3pm  4pm. This talk will describe how to solve the problems that were posed in Baraniuk's talk from January 14.  
11  Feb 13  TD Learning (Jacek Kisynski).  none.  DF lecture 13 and lecture 14.  VR lecture 9 and lecture 10, sections 13. BS lecture 23. 
Feb 1822  Midterm Break (no class)  
12  Feb 25  Kalman filters (M. Emtiyaz Khan).  Emtiyaz's Kalman filter demo: KFdemo.m and KF.m  Emtiyaz's slides. Stephen Boyd's notes on Kalman filters.  BS lecture 11. A short note by Emtiyaz on information filters. 
13  Feb 27  HamiltonJacobi equation for nonlinear optimal control (Ivan Sham). Optimal Stopping (Amit Goyal).  none.  HJ: Ivan's slides. Optimal Stopping: Amit's slides.  HJ: BS lecture 7. Optimal Stopping: BS lecture 6; DF lecture 15; VR lecture 10, section 4. 
14  March 3  Eikonal equation for continuous shortest path (Josna Rao). Qfactors and Qlearning (Stephen Pickett).  Jamie Sethian's web page has lots of Fast Marching related material, including an applet demonstrating a continuous version of the travelling salesman problem.  Eikonal equation: Josna's slides. Qlearning: Stephen's notes.  Qlearning: BS lecture 24; DF lecture 7, section 2 and lecture 8, section 2; VR lecture 5, lecture 6 and lecture 7. 
15  March 5  Value function approximation with Linear Programming (Jonatan Schroeder). Some of David Poole's interactive applets (Jacek Kisynski).  David Poole's Interactive Demos.  Linear Programming: Jonatan's slides.  none. 
16  March 10  Value function approximation with neural networks (Mark Schmidt). Differential dynamic programming (Sang Hoon Yeo).  Function approximation: Mark's Matlab code. May require minFunc.  Function approximation: Mark's slides. DDP: Sang Hoon's slides.  DDP: "Random Sampling of States in Dynamic Programming", Christopher G. Atkeson & Benjamin Stephens, NIPS 2007. 
17  March 12  Policy search / reinforcement learning method PEGASUS for helicopter control (Ken Alton). Schemes for solving stationary HamiltonJacobi PDEs: Fast Marching, sweeping, transformation to timedependent form.  autonomous helicopter website (including videos).  Pegasus: Ken's slides.  none. 
18  March 17  ADP in sensor networks (Jonatan Schroeder) and LiveVessel (Josna Rao)  none.  Jonatan's slides. Josna's slides.  Sensor Networks: Jason L. Williams, John W. Fisher III, Alan S. Willsky, "Approximate Dynamic Programming for CommunicationConstrained Sensor Network Management," IEEE Trans. Signal Processing, v. 55, n. 8, pp. 43004311 (August 2007). LiveWire: William A. Barrett & Eric. N. Mortensen, "Interactive LiveWire Boundary Extraction," Medical Image Analysis, v. 1, n. 4, pp. 331341 (Sept 1997). LiveVessel: Kelvin Poon, Ghassan Hamarneh & Rafeef Abugharbieh, "LiveVessel: Extending Livewire for Simultaneous Extraction of Optimal Medial and Boundary Paths in Vascular Images," Medical Image Computing and ComputerAssisted Intervention (MICCAI), LNCS 4792, pp. 444451 (2007). 
19  March 19  ADP for Tetris (Ivan Sham) and ADP with Diffusion Wavelets and Laplacian Eigenfunctions (Ian).  singular value decomposition (SVD) based image compression demo.  Ivan's slides.  Vivek F. Farias & Benjamin Van Roy, "Tetris: A Study of Randomized Constraint Sampling," Probabilistic and Randomized Methods for Design Under Uncertainty (Calafiore & Dabbene eds.) SpringerVerlag (2006). Dimitri P. Bertsekas & Sergey Ioffe, "Temporal DifferencesBased Policy Iteration and Applications in NeuroDynamic Programming," Report LIDSP2349, MIT (1996).. Daniela de Farias & Benjamin Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, v. 51, n. 6, pp. 850856 (2003). Sridhar Mahadevan & Mauro Maggioni, "Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions," Neural Information Processing Systems (NIPS), MIT Press (2006). 
March 24  Easter Monday (no class)  
March 26  Class cancelled. Work on your projects.  
20  March 31  Rating game players with DP (Stephen Pickett) and Hierarchical discretization with DP (Amit Goyal).  Mark Glickman's research web page.  Amit's slides.  Mark Glickman, "Paired Comparison Models with TimeVarying Parameters", Harvard Dept. Statistics Ph.D. thesis (1993). ChingCheng Shen & YenLiang Chen, "A Dynamic Programming Algorithm for Hierarchical Discretization of Continuous Attributes," European J. Operational Research, v. 184, n. 2, pp. 636651 (January 2008). 
21  April 2  DPlike Suboptimal Control: Certainty Equivalent Control (CEC), OpenLoop Feedback Control (OLFC), limited lookahead.  none.  BS lecture 13 (CEC) and lecture 14 (limited lookahead).  BB 6.06.3 
23  April 7  DPlike Suboptimal Control: Rollout, model predictive control and receding horizon.  Example project presentation.  BS lecture 15 (rollout)  BB 6.46.5 
24  April 9  Project Presentations (peer evaluation form)  
April 1529  Final Exam Period. Projects due 3pm Friday April 25. 