PhD Thesis Defense - Jonathan Wilder Lavington

Date

Name: Jonathan Wilder Lavington

Date: Friday, October 4th

Time: 9:30 am 

Location: ICCS X836

Zoom: https://ubc.zoom.us/j/66528654938?pwd=7bYCRtKeDVbIl2epDdzXnfBHqCFK4g.1

Supervisors: Mark Schmidt and Frank Wood

Title: Taking Advantage of Common Assumptions in Policy Optimization and Reinforcement Learning

Abstract:

This work considers training conditional probability distributions called policies, using simulated environments via gradient-based optimization methods. It begins by investigating the effects that complex model classes have on settings where a policy is learned through the imitation of expert data which is gathered through repeated environmental interaction. Next, it discusses how to build a gradient based optimizer which is tailored specifically to policy optimization where querying gradient information is expensive. We then consider policy optimization settings which contain imperfect expert demonstration, and design an algorithm which utilizes additional information available during training to improve the policy performance at test time and the efficiency of learning. Lastly, we consider how to generate behavioral data which satisfies hard constraints by using a combination of learned inference artifacts and a special variant of sequential Monte Carlo.