**CPSC 533V 2021W1 Schedule** This schedule for [CPSC 533V](index.html) will be updated throughout the term. Wed 8 Sep 2021: Welcome + L1 - motivation: example motion skills - __Animal movement:__ [chicken head tracking](https://www.youtube.com/watch?v=_dPlkFPowCc) - [falcon](https://mobile.twitter.com/aympontier/status/1213098212904591362) - [cat stepping](https://mobile.twitter.com/DorsaAmir/status/1206592889750339584) - [snake climbing tree](https://www.youtube.com/watch?v=e_K-XZPyOfM) - [goats on dam](https://www.youtube.com/watch?v=RG9TMn1FJzc?t=106) - [goat leaps](https://youtu.be/Qh02UvQUovU?t=27m24s) - __Robots:__ [triple inverted pendulum balance](https://mobile.twitter.com/InertialObservr/status/1213261159324741632) - [flipping pancakes](https://www.youtube.com/watch?v=W_gxLKSsSIE) - [Atlas "Do You Love Me?"](https://www.youtube.com/watch?v=fn3KWM1kuAw) - [Atlas Parkour](https://www.youtube.com/watch?v=tF4DML7FIWk) - [Atlas Parkour fails](https://www.youtube.com/watch?v=Ggk26a7GTbE) - [Iterative Gait Design for Cassie](https://www.youtube.com/watch?v=TgFrcrARao0) - [Cassie blind stair climbing](https://www.youtube.com/watch?v=MPhEmC6b6XU) - [high-speed hand, Komura lab](https://www.youtube.com/watch?v=-KxjVlaLBmk) - __Humans:__ [cup stacking](https://www.youtube.com/watch?v=Ay8s6vgLZ1c) - [lifting spinning wheel]()https://www.youtube.com/watch?v=GeyDf4ooPdo) - [human motion sculptures](https://www.youtube.com/watch?v=i0WpFwBuXvI&t=16s) - __Passive-control and simple-controller motions:__ [Simbicon balance](https://www.youtube.com/watch?v=uBQfSBluhFU) - [passive dynamic walking](https://www.youtube.com/watch?v=FfKQSUhYjlY) - [passive juggling](https://www.youtube.com/watch?v=2ZfaADDlH4w) - [bicycle stability](https://www.youtube.com/watch?v=wrTFE5feh8I&feature=youtu.be) - [reverse steering bicycle](https://youtu.be/MFzDaBzBlL0?t=33) - [wheel-with-mass](https://mobile.twitter.com/StevenHCollins/status/1196903413566410752) - course administrivia - what is RL? - RL caveats - icebreaker Mon 13 Sep 2021: Sequential decision problems; RL basics + L2 - lecture notes: to be posted; see Piazza for link to recorded lecture Wed 15 Sep 2021: RL basics (continued) + L3 - Reading for [Assignment 1](a1.html) - lecture notes: [2021-sept-15.pdf](2021-sept-15.pdf) - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 3: Finite Markov Decision Processes - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 4: Dynamic Programming Mon 20 Sep 2021: Q-functions, MC evaluation, TD learning + L4 - Course drop date - [Assignment 1](a1.html) out; due Wed Sep 29 Wed 22 Sep 2021: Model-free learning, Q-learning + L5 - lecture notes, Sept 8-22: [2021-sept-22.pdf](2021-sept-22.pdf) - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 2: Multi-armed Bandits (2.1 -- 2.5) - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 5: Monte Carlo Methods (5.1) - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 6: Temporal Difference Learning (6.1, 6.2, 6.5, 6.7) - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 7: n-Step Bootstrapping (7.1) Mon 27 Sep 2021: Deep Q-Networks + L6 - notes: see next day Wed 29 Sep 2021: DDPG, Policy search + L7 - lecture notes, Sept 27-29 (and overlap with earlier classes) [2021-sept-29.pdf](2021-sept-29.pdf) - lecture not recorded today (sorry, I forgot!) - Assignment 1 due today (any time of day) - Assignment 2: [Python notebook (GitHub)](https://github.com/UBCMOCCA/CPSC533V_2021W1/blob/master/a2/a2.ipynb) - DPG algorithms: [Deterministic Policy Gradient Algorithms, ICML 2014](http://proceedings.mlr.press/v32/silver14.html) - DDPG algorithm: [Continuous control with deep reinforcement learning, ICLR 2016 ](https://arxiv.org/abs/1509.02971) - DDPG stability: [Deep Reinforcement Learning That Matters, AAAI 2018](https://ojs.aaai.org/index.php/AAAI/article/view/11694) Mon 4 Oct 2021: Policy Gradient methods + L8 - lecture notes, Oct 4: [2021-Oct-4.pdf](2021-Oct-4.pdf); live notes: [2021-Oct-4-note.pdf](2021-Oct-4-note.pdf) - [David Silver: Policy Gradient slides](https://www.davidsilver.uk/wp-content/uploads/2020/03/pg.pdf) Wed 6 Oct 2021: Policy Gradients, continued + L9 - [policy gradients web demo](https://observablehq.com/@farzadab/policy-gradients) by Farzad Abdolhosseini - policy gradients: review, demo, derivation from importance sampling, GAE, batching, intro to step-size control - reading: [PPO paper, Schulman 2017](https://arxiv.org/pdf/1707.06347.pdf) - reading: [Sutton](http://incompleteideas.net/book/the-book-2nd.html), Chapter 13: Policy Gradient Methods - reading: [What Matters in On-Policy RL?](https://arxiv.org/pdf/2006.05990.pdf) - A3 out [A3 Python notebook (GitHub)](https://github.com/UBCMOCCA/CPSC533V_2021W1/blob/master/a3/hw3.ipynb), due Fri Oct 15 (Mon 11 Oct 2021): Thanksgiving Wed 13 Oct 2021: Policy Gradient methods with step-size control: TRPO and PPO + L10 - lecture notes, Oct 13 (and including all Policy Gradient slides) [2021-Oct-13.pdf](2021-Oct-13.pdf) - A3 due on Friday (any time of day) Mon 18 Oct 2021: Model-based RL, Part 1 + L11 - lecture nores, Oct 18, Model-based RL, part 1 [2021-Oct-18.pdf](2021-Oct-18.pdf) - [ICML 2020 tutorial on Model-based RL](https://kargarisaac.github.io/blog/reinforcement%20learning/mbrl/jupyter/2020/10/26/mbrl.html) - [Maria Bauza slides on Model-based RL](https://web.mit.edu/6.246/www/lectures/Model-based-RL.pdf) - [Cognitive Maps in Rats and Men (Tolman, 1948)](http://www.guillaumegronier.com/2020-psychologiegenerale/resources/Tolman1948.pdf) - [Aggressive driving with model predictive path integral control, ICRA 2016](https://ieeexplore.ieee.org/iel7/7478842/7487087/07487277.pdf) - [video](https://www.youtube.com/watch?v=FbcGs-XoiUw) - [Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning, ICRA 2018](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8463189) - [talk](https://www.youtube.com/watch?v=G7lXiuEC8x0) - [Recurrent Models Facilitate Policy Evolution, NeurIPS 2018](https://arxiv.org/pdf/1809.01999.pdf) - [talk](https://www.youtube.com/watch?v=HzA8LRqhujk) - [World Models, 2018](https://arxiv.org/pdf/1803.10122.pdf) another version of the paper above Wed 20 Oct 2021: Model-based RL, Part 2 + L12 - lecture notes, Oct 20, Model-based RL, part 1 [2021-Oct-20.pdf](2021-Oct-20.pdf) - A4: Policy gradients and PPO [Python notebook (GitHub)](https://github.com/UBCMOCCA/CPSC533V_2021W1/tree/master/a4), due Fri Oct 29 - Deep RL in a Handful of Trials using Probabilistic Dynamics Models[PETS](https://arxiv.org/pdf/1805.12114.pdf) - Interesting recent paper about PETS: On the Importance of Hyperparamater Optimization for Model-based RL [paper]() - [project page](https://www.natolambert.com/papers/2021-hyperparams-mbrl) - Convex MPCs: [https://stanford.edu/class/ee364b/lectures/mpc_slides.pdf](https://stanford.edu/class/ee364b/lectures/mpc_slides.pdf) - excellent talk by Scott Kuindersma, on Boston Dynamics controllers and MPC [YouTube talk](https://www.youtube.com/watch?v=EGABAx52GKI) - Control of Rotational Dynamics [video](http://www.macchietto.com/papers/crd2014.mp4) - Steven Brunton [excellent YouTube MPC lecture](https://www.youtube.com/watch?v=YwodGM2eoy4) - CORL 2020 talk: Stein Variational MPC [YouTube](https://www.youtube.com/watch?v=kGDO2e_qiyI) - MPC for MIT Cheetah quadruped (ICRA 2021 Workshop on advanced in MPC and RL) [YouTube](https://www.youtube.com/watch?v=VRNqz1w-87o) - Manfred Morari, MPC [invited lecture, YouTube](https://www.youtube.com/watch?v=P8XQRZQQxZE?t=817) - [SINDY: Sparse identification of nonlinear dynamics for model predictive control in the low-data limit](https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2018.0335) Mon 25 Oct 2021: SAC, TD3, Hindight Experience Replay, Asymmetric Actor Critic + L13 - lecture notes, Oct 25 [2021-Oct-25.pdf](2021-Oct-25.pdf) Wed 27 Oct 2021: Imitation Learning + L14 - lecture notes, Oct 27 [2021-Oct-27.pdf](2021-Oct-27.pdf) - [course project](project.pdf) Proposal due Mon Nov 8 - class survey - [survey.pdf](survey.pdf); see Piazza for link to Gdoc - Paper discussions -- list of papers, Google doc to be posted in Piazza; add indications of interest by Wed Nov 3 Mon 1 Nov 2021: Paper discussion -- setup + L15 - papers to discuss: expressions of interest - time to work on project proposals Wed 3 Nov 2021: Paper discussion: Reward is Enough + L16 - [Reward is Enough](http://nooverfit.com/wp/wp-content/uploads/2021/06/1-s2.0-S0004370221000862-main.pdf) - Discussion questions: (choose 4 out of the following) 1. What is the strongest argument for “reward is enough” ? 2. What is the weakest argument? 3. The paper argues that many possible rewards can be sufficient. Do you agree with this? 4. How do the proposed arguments relate to the evolutionary origins of intelligence? 5. An alternative idea might be to say: “Prediction is enough”, i.e., along the lines of self-supervised learning. Is this a suitable alternative? 6. From a practical perspective, are the proposed ideas actionable? I.e., suppose we wish to design an intelligent system that meets certain specifications. Mon 8 Nov 2021: Paper discussion: Learning for Quadrupeds + L17 - [Learning Agile and Dynamic Motor Skills for Legged Robots](https://arxiv.org/abs/1901.08652) - Discussion questions: 1. What are the difficulties of learning to control Anymal in simulation, followed by deployment of the learned policy on the physical robot? How are these addressed? 2. How simple or complex is the reward function, and how was it developed? 3. What are the main contributions of the paper? 4. What are the limitations of the work presented? 5. Which parts of the paper did you or your group not fully understand? Wed 10 Nov 2021): ~~TBD~~ (Wed 10 Nov 2021): Midterm break (Wed 11 Nov 2021): Midterm break (Wed 12 Nov 2021): Midterm break Mon 15 Nov 2021: Paper Discussion: Learning to Adapt + L18 - [Preparing for the unknown: Learning a universal policy with online system identification](https://ckllab.stanford.edu/preparing-unknown-learning-universal-policy-online-system-identification) RSS 2017
Related reading: - [Learning dexterous in-hand manipulation](https://journals.sagepub.com/doi/pdf/10.1177/0278364919887447) - [video](https://www.youtube.com/watch?v=jwSbzNHGflM) - [Solving Rubik's Cube with a Robot Hand](https://openai.com/blog/solving-rubiks-cube/) - [video](https://www.youtube.com/watch?v=kVmp0uGtShk) - Discussion questions: 1. Are the universal policy (UP) and Online System Identification (OSI) models trained together or sequentially?
What would be the pros-and-cons of either approach? 1. What kinds of distributions are used for the unknown model parameters? 1. What is tricky about training the Online System Identification (OSI) model? 1. Describe several limitations and/or directions for future work. Wed 17 Nov 2021: Paper Discussion: Learning for biomechanics + L19 - [Scalable muscle-actuated human simulation and control](https://mrl.snu.ac.kr/research/ProjectScalable/Page.htm) - SIGGRAPH 2019
Related reading: - [simple PD controller web demo](https://www.matthewpeterkelly.com/tutorials/pdControl/index.html) - [Synthesis of Biologically Realistic Human Motion Using Joint Torque Actuation](https://ckllab.stanford.edu/synthesis-biologically-realistic-human-motion-using-joint-torque-actuation), SIGGRAPH 2019 - Discussion questions: 1. What is the advantage of using a muscle coordination network, over simply trying to control all the muscles directly? 1. What is the available range of muscle activations? What stops the muscle coordination network from outputting muscle activations outside of this range? 1. How important is the foot model? 1. What parts of the paper did your group not understand? 1. Could the work be applied to non-imitation based tasks? Mon 22 Nov 2021: Paper Discussion: Contrastive Unsupervised Represenations for RL + L20 - Paper to read: [CURL: Contrastive Unsupervised Representations for Reinforcement Learning](https://arxiv.org/abs/2004.04136) - ICML 2020 - Discussion questions (answer 4 of 5): 1. For RL, what is the potential benefit of good (learned) representations? 2. What is the dimension of the learned embeddings actually used in the paper? (see supplemental material at the end) 3. Both the RL policy and the contrastive loss update the encoder. How are these weighted? What happens if only the contrastive loss is used to learn the encoder? 4. Does the paper make a significant advance? Why or why not? 5. What parts of the paper did your group not understand? - [Yannic Kilcher discussion -- recommended](https://www.youtube.com/watch?v=hg2Q_O5b9w4) - [Blog description](https://bair.berkeley.edu/blog/2020/07/19/curl-rad/) - Related paper: [RL with Augmented Data](https://proceedings.neurips.cc/paper/2020/file/e615c82aba461681ade82da2da38004a-Paper.pdf) - NeurIPS 2020 - Related paper: [Decoupling Representation Learning from RL](http://proceedings.mlr.press/v139/stooke21a.html) - ICML 2021 - Related paper: [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf) - NeurIPS 2018 - Related paper: [Unsupervised State Representation Learning in Atari](https://arxiv.org/pdf/1906.08226.pdf) - NeurIPS 2019 Wed 24 Nov 2021: Paper Discussion: Character controllers using motion VAEs + L21 - Paper to read: [Character controllers using motion VAEs](https://hungyuling.com/projects/MVAE/) - SIGGRAPH 2020 - Discussion questions: 1. What is a variational autoencoder? How does it differ from a simple autoencoder? 1. What is an autoregresive model? 1. What is the action space for the learned control policy? 1. What is `scheduled sampling' and why is it important for learning the autoregressive VAE? 1. What parts of the paper did your group not understand? - accessible explanation of [variational autoencoders](https://jaan.io/what-is-variational-autoencoder-vae-tutorial/) - Other learned methods for kinematic animation: See "Phase functioned neural networks" on [Daniel Holden's web page](https://theorangeduck.com/page/publications) Mon 29 Nov 2021: Paper Discussion: Emergent Tool Use from Multi-Agent Autocurricula + L22 - LQR control [LQR.pdf](LQR.pdf) - Paper to read: [Emergent Tool Use From Multi-Agent Autocurricula](https://arxiv.org/abs/1909.07528) - Discussion questions: 1. What are the physics and action capabilities of an agent? Do agents have memory? 1. What are the team sizes? Do all agents on a team share the same policy? Does an agent know where its other team members are? 1. Does more initial randomization help or hinder the sophistication of the discovered strategies? 1. Do you think this paper makes a strong contribution or not? - [Related blog](https://openai.com/blog/emergent-tool-use/) - Related paper: [Multi-agent actor-critic for mixed cooperative-competitive environments](https://arxiv.org/pdf/1706.02275.pdf) NeurIPS 2019 - [Related video](https://sites.google.com/site/multiagentac/) Wed 1 Dec 2021: Paper Discussion: AlphaGo Zero + L23 - Paper to read: [AlphaGo Zero](https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf) - Discussion questions: 1. Does AlphaGo Zero play directly using a learned policy, model-based planning, or both? 1. How does AlphaGo Zero differ from earlier versions of AlphaGo? 1. Does prior knowledge of rotational/reflectional invariance help inform the algorithm? 1. Are the policy and value networks independent, or do they use a shared network? 1. What parts of the paper did your group not understand? - Related paper: [AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://discovery.ucl.ac.uk/id/eprint/10069050/1/alphazero_preprint.pdf) - Science 2018 - recommended: [David Silver Alpha Zero talk](https://www.youtube.com/watch?v=Wujy7OzvdJk) Mon 6 Dec 2021: Wrap-up lecture + L24 - RL as a form of supervised learning [2021-Dec-6.pdf](2021-Dec-6.pdf) - wrap up -->