Non-parametric Contextual Bandits
By John Chia, UBC Computer Science
Bandit problems are a class of online learning problems where a learning algorithm plays rounds in an online manner and at each round gets to choose among taking one of K actions. In the contextual case, each round comes with a set of side information which can be used to decide which action to take. The objective is to maximize the accumulated rewards. Many online recommendation problems can be formulated in this framework. I will begin by introducing the contextual bandit problem then briefly discuss a selection of existing work and give our non-parametric extension based on Gaussian processes. I will also discuss the results of our experiments.
Visit the LCI Forum page