PhD Thesis Defense - Jonathan Wilder Lavington
Name: Jonathan Wilder Lavington
Date: Friday, October 4th
Time: 9:30 am
Location: ICCS X836
Zoom: https://ubc.zoom.us/j/66528654938?pwd=7bYCRtKeDVbIl2epDdzXnfBHqCFK4g.1
Supervisors: Mark Schmidt and Frank Wood
Title: Taking Advantage of Common Assumptions in Policy Optimization and Reinforcement Learning
Abstract:
This work considers training conditional probability distributions called policies, using simulated environments via gradient-based optimization methods. It begins by investigating the effects that complex model classes have on settings where a policy is learned through the imitation of expert data which is gathered through repeated environmental interaction. Next, it discusses how to build a gradient based optimizer which is tailored specifically to policy optimization where querying gradient information is expensive. We then consider policy optimization settings which contain imperfect expert demonstration, and design an algorithm which utilizes additional information available during training to improve the policy performance at test time and the efficiency of learning. Lastly, we consider how to generate behavioral data which satisfies hard constraints by using a combination of learned inference artifacts and a special variant of sequential Monte Carlo.