**CPSC 533V Sep-Dec 2024 Schedule**
This schedule for [CPSC 533V](index.html) will be updated throughout the term.
Wed 4 Sep 2024: Introduction
+ L1
- [533V-Sep-4.pdf](533V-Sep-4.pdf)
- Introduction & Motivation
- course administrivia
- what is RL?
- icebreaker
Mon 9 Sep 2024: Sequential decision problems; RL basics
+ L2
- motivation: example motion skills
- __Animal movement:__
[chicken head tracking](https://www.youtube.com/watch?v=_dPlkFPowCc) -
[falcon](https://mobile.twitter.com/aympontier/status/1213098212904591362) -
[cat stepping](https://mobile.twitter.com/DorsaAmir/status/1206592889750339584) -
[snake climbing tree](https://www.youtube.com/watch?v=e_K-XZPyOfM) -
[goats on dam](https://www.youtube.com/watch?v=RG9TMn1FJzc?t=106) -
[goat leaps](https://youtu.be/Qh02UvQUovU?t=27m24s)
- __Robots:__
[triple inverted pendulum balance](https://mobile.twitter.com/InertialObservr/status/1213261159324741632) -
[flipping pancakes](https://www.youtube.com/watch?v=W_gxLKSsSIE) -
[Atlas "Do You Love Me?"](https://www.youtube.com/watch?v=fn3KWM1kuAw) -
[Atlas Parkour](https://www.youtube.com/watch?v=tF4DML7FIWk) -
[Atlas Parkour fails](https://www.youtube.com/watch?v=Ggk26a7GTbE) -
[Iterative Gait Design for Cassie](https://www.youtube.com/watch?v=TgFrcrARao0) -
[Cassie blind stair climbing](https://www.youtube.com/watch?v=MPhEmC6b6XU) -
- __Humans:__
[cup stacking](https://www.youtube.com/watch?v=Ay8s6vgLZ1c) -
[lifting spinning wheel]()https://www.youtube.com/watch?v=GeyDf4ooPdo) -
[human motion sculptures](https://www.youtube.com/watch?v=i0WpFwBuXvI&t=16s)
- __Passive-control and simple-controller motions:__
[Simbicon balance](https://www.youtube.com/watch?v=uBQfSBluhFU) -
[passive dynamic walking](https://www.youtube.com/watch?v=FfKQSUhYjlY) -
[passive juggling](https://www.youtube.com/watch?v=2ZfaADDlH4w) -
[bicycle stability](https://www.youtube.com/watch?v=wrTFE5feh8I&feature=youtu.be) -
[wheel-with-mass](https://mobile.twitter.com/StevenHCollins/status/1196903413566410752)
- lecture notes: to be posted; see Piazza for link to recorded lecture
- [cs-vs-eng-blank.pdf](cs-vs-eng-blank.pdf)
- [cs-vs-eng.pdf](cs-vs-eng.pdf)
- RL caveats
- Markov Process, MDPs
Wed 11 Sep 2024: Markov Decision Processes
+ L3
- POMDPs, episodic tasks, discounting, deterministic vs stochastic policies
- cumulative returns and expected returns
- Assignment 1: [a1.pdf](a1.pdf) - due Wed Jan 25
Mon 16 Sep 2024: More RL basics
- policy iteration, value iteration, state value functions, state action value functions
- stochastic MDPs
- model-based vs model-free: learning and implicit policies
- dynamic programming (model-based) methods
- alternative to RL for model-based scenarios: linear programming, solving for level sets with PDEs
- basic building blocks for deep RL algorithms
- [notes-Sept-4-to-16.pdf](notes-Sept-4-to-16.pdf)
Wed 18 Sep 2024: Q-learning
- [notes-Sept-18-v2.pdf](notes-Sept-18-v2.pdf)
- [robot-state.pdf](robot-state.pdf)
- state and action spaces for movement control
- temporal difference learning, n-step TD, TD(lambda)
Mon 23 Sep 2024: Q-learning
- eps-greedy exploration, Q-learning,
- maximization bias, double-Q learning, constant-vs-variable Q-learning step-sizes
- loss function for Q-learning optimization
Wed 25 Sep 2024: DQN
- Deep Q Networks
- dueling DQN, distributional RL, current DQN methods
Mon 30 Sep 2024: No class
- no class
Wed 2 Oct 2024: DDPG, TD3
- DDPG, TD3
- [notes-DQN-DDPG-TD3-PGintro.pdf](notes-DQN-DDPG-TD3-PGintro.pdf)
- A1 back on Thu
Mon 7 Oct 2024: Policy Methods, Hill Climbing
- A2 out (due Fri Oct 18): tabular Q, DQN --- [A2 github](https://github.com/UBCMOCCA/CPSC533V_2024W1/tree/main)
- Policy Methods and Policy Gradient: Introduction
- [notes-PG-basic.pdf](notes-PG-basic.pdf)
Wed 9 Oct 2024: Policy Gradient
- REINFORCE
Wed 16 Oct 2024: Proximal Policy Optimization (PPO)
- step sizes, TRPO, PPO
Mon 21 Oct 2024: Soft Actor Critic (SAC)
- Soft Actor Critic (SAC)
- Model Predictive Control (introduction)
- [notes-all-to-MPC.pdf](notes-all-to-MPC.pdf)
Wed 23 Oct 2024: Model Predictive Control
- Model Predictive Control
- [notes-all-to-MPC.pdf](notes-all-to-MPC.pdf)
- [A3](https://github.com/UBCMOCCA/CPSC533V_2024W1) Policy Gradients and PPO (out as of Thu Oct 24; due Fri Nov 8)
Mon 28 Oct 2024: Model Predictive Control
- probabilistic ensembles with trajectory sampling (PETS)
- back-propagation through time
- MPC taxonomy summary
- which RL algorithm to use?
- [notes-all-to-MPC.pdf](notes-all-to-MPC.pdf)
- [Project Proposal and Project](project.html) - Project Proposal due Tue Nov 12
- [Paper Discussions](discussion.html) - Votes for Papers due Wed Nov 13
Wed 30 Oct 2024: Common RL tricks
- Asymmetric Actor Critic
- Privileged Learning
- DAgger
- Hindsight Experience Replay
- POMDPs and Policies with memory
- Dynamics Randomization
- [notes-RL-tricks.pdf](notes-RL-tricks.pdf)
Mon 4 Nov 2024: RL for physics-based characters and robots
- DeepMimic, DeepLoco, Cassie bipedal robot control
- Blind Bipedal Stair Traversal
- Skills from Video, ALLSTEPS, RL for high-jumping
- [RL-humanoids+robots.pdf](RL-humanoids+robots.pdf)
Wed 6 Nov 2024: Imitation-based Learning
- Physics-Based Motion Retargeting from Sparse Inputs
[SCA 2023 & arXiv](https://arxiv.org/abs//2307.01938) [video](https://www.youtube.com/embed/5D-DvX5scTk)
- CLoSd: Closing the Loop between Simulation and Diffusion for multi-task character control [Project page]((arXiv) - [video](https://www.youtube.com/embed/O1tzbiDMW8U)
- PartwiseMPC: Interactive Control of Contact-Guided Motions (SCA 2024)
[paper](https://www.cs.ubc.ca/~van/papers/2024-partwiseMPC/index.html) - [video](https://www.youtube.com/embed/wdL9EvkIWZo)
- initial papers and questions to be announced by Friday (Nov 18 onwards)
Mon 11 Nov 2024: no class
- Remembrance Day, UBC midterm break
Wed 13 Nov 2024: no class
- UBC midterm break
Mon 18 Nov 2024: Paper Discussion
- class replaced by guest lecture: 1pm, ICCS x836
Oliver Schulte, SFU: "When Should Reinforcement Learning Use Causal Reasoning?"
- related paper: “Why Online Reinforcement Learning is Causal” https://arxiv.org/pdf/2403.04221
- Please: (a) attend the talk in person (preferred), or on Zoom (see Piazza @44 for link); (b) make one public post to Piazza #paper_discussions, for Nov 18, describing: (i) a key insight that you learned from the talk; (ii) an aspect of the talk or material covered that you did not fully understand.
Wed 20 Nov 2024: Paper discussion: Decision Transformers
- [“Decision Transformer: Reinforcement Learning via Sequence Modeling”](https://arxiv.org/abs/2106.01345)
- Trajectory Transformers: ["Offline Reinforcement Learning as One Big Sequence Modeling Problem"](https://arxiv.org/pdf/2106.02039.pdf)
- see also: [“Training Agents using Upside-Down RL”](https://arxiv.org/abs/1912.02877) 2019
- Discussion questions:
1. How do transformer-based models differ from behavior cloning?
1. Why should they perform better than behavior cloning?
1. What is the purpose of the beam search in Trajectory Transformers?
1. What kind of data should these methods be trained on?
1. Describe aspects of either paper that your group did not understand.
Mon 25 Nov 2024: Paper Discussion: Multi-agent RL
- ["Multi-agent actor-critic for mixed cooperative-competitive environments"](https://arxiv.org/pdf/1706.02275.pdf), NeurIPS 2017
- Discussion questions:
1. Which RL algorithm is being used? For N agents, what is the number of policies and Q functions being trained?
1. What would be the benefirs limitation(s) of using a single shared Q function, with multiple policies?
1. What would be the benefit(s) and limitation(s) of using a single shared policy
1. What is the purpose of using an ensemble of policies? Might this also be achievable in other ways?
1. Describe aspects of the paper that your group did not understand.
Wed 27 Nov 2024: Paper discussion: Deep RL from human preferences
- Deep Reinforcement Learning from Human Preferences (NeurIPS 2017)
- [arXiV PDF](https://arxiv.org/pdf/1706.03741.pdf) - [OpenAI project webpage](https://openai.com/research/learning-from-human-preferences)
- Discussion questions:
1. To what extent are the ideas presented in the paper new?
1. What is a synthetic label? Give arguments in favor and against their use as a proxy.
1. Why not have the humans directly assign an estimated score for any motion?
1. Give three additional experiments that could have been added to improve the paper.
- followup work (optional to read):
- [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://cdn.jsdelivr.net/gh/LuYF-Lemon-love/susu-ChatGPT-papers/papers/07-DPO.pdf), NeurIPS 2023
Mon 2 Dec 2024: Offline RL
- [Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents - Sergey Levine](https://www.youtube.com/watch?v=wraCgn27kVA)
- [Imitation learning vs. offline reinforcement learning - Sergey Levine](https://www.youtube.com/watch?v=sVPm7zOrBxM)
- related tutorial paper (optional): [Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems](https://arxiv.org/pdf/2005.01643)
- one more addition: [A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems](https://arxiv.org/pdf/2203.01387)
- Discussion Questions:
1. What are the advantages of offline RL?
1. What are the limitations of offline RL?
1. How is offline RL different from imitation learning?
1. Describe aspects of offline RL that your group did not understand.
Wed 4 Dec 2024: Wrap-up lecture
- course wrap-up (slides to be posted after class)
- time for completion of course evaluations
-->