Non-parametric Contextual Bandits
By John Chia, UBC Computer Science
Abstract:
Bandit problems are a class of online learning problems where a learning
algorithm plays rounds in an online manner and at each round gets to choose
among taking one of K actions. In the contextual case, each round comes with a
set of side information which can be used to decide which action to take. The
objective is to maximize the accumulated rewards. Many online recommendation
problems can be formulated in this framework. I will begin by introducing the
contextual bandit problem then briefly discuss a selection of existing work and
give our non-parametric extension based on Gaussian processes. I will also
discuss the results of our experiments.