Date: December 16th, 2011

Morning session: 7:30-10:30 am

[7:30] Welcome and introduction
[7:50] Invited Talk 1: Remi Munos. Hierarchical bandits for function optimization
[8:20] Contributed Talk 1: Javad Azimi, Ali Jalali, and Xiaoli Fern. Dynamic Batch Bayesian Optimization
[8:40] Contributed Talk 2: Alexandre Passos, Jurgen Van Gael, Ralf Herbrich and Ulrich Paquet. A penny for your thoughts: the value of information in recommendation systems
[9:00] Break
[9:20] Spotlights of poster presentations
[9:50] Poster session 1

Afternoon session: 4:00-7:30 pm

[4:00] Invited Talk 2: Andreas Krause. Information-theoretic Regret Bounds for (Contextual) Gaussian Process
Bandit Optimization
[4:30] Invited Talk 3: Csaba Szepesvari.
[5:00] Poster session 2
[5:30] Break
[5:50] Invited Talk 4: Louis Dorard. Gaussian Process Bandits applied to Tree Search
[6:20] Panel discussion

Accepted papers (all to be presented as posters)

(where received, we also provide the posters)

Shipra Agrawal and Navin Goyal.
Analysis of Thompson Sampling for the multi-armed bandit problem [paper] [extended version]
Christoforos Anagnostopoulos and Robert B. Gramacy.
Dynamic trees for online analysis of massive data [paper]
Javad Azimi, Ali Jalali, and Xiaoli Fern.
Dynamic Batch Bayesian Optimization [paper]
James Bergstra.
Implementations of Algorithms for Hyper-parameter Selection [paper]
Nicolò Cesa-Bianchi and Sham Kakade.
An Optimal Algorithm for Linear Bandits [paper]
Nando de Freitas*, Alex Smola, and Masrour Zoghi.
Regret Bounds for GP Bandits Without Observation Noise [paper]
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke.
Contextual Bandits for Information Retrieval [paper]
Philipp Hennig and Christian J. Schuler.
Information-Greedy Global Optimization [paper] [extended version]
Neil M. T. Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel.
Bayesian Active Learning for Gaussian Process Classification [paper] [poster]
Frank Hutter*, Holger Hoos, and Kevin Leyton-Brown.
Bayesian Optimization With Censored Response Data [paper] [poster]
M. Jala, C. Levy-Leduc and E. Moulines, E. Conil and J. Wiart.
Sequential Design of Computer Experiments for the Estimation of a Quantile with Application to Numerical Dosimetry [paper]
Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier.
On the efficiency of Bayesian bandit algorithms from a frequentist point of view [paper]
Teodor Mihai Moldovan and Pieter Abbeel.
Bayesian Safe Exploration in Markov Decision Processes [paper]
Alexandre Passos, Jurgen Van Gael, Ralf Herbrich and Ulrich Paquet.
A penny for your thoughts: the value of information in recommendation systems [paper]
Jasper Snoek, Hugo Larochelle, and Ryan Prescott Adams.
Opportunity Cost in Bayesian Optimization [paper] [poster]
Matthew Tesch, Jeff Schneider, and Howie Choset.
Adapting Control Policies for Expensive Systems to Changing Environments [paper] [poster]

* Submissions by workshop organizers were entirely shielded from the respective organizer and reviewed double-blindly.

Abstracts for invited talks

Remi Munos. Hierarchical bandits for function optimization
The "optimism in the face of uncertainty" principle has been recently applied to complex sequential decision making and planning problems, such as the game of go. The idea is to use bandit algorithms to efficiently explore the set of possible strategies given noisy evaluations of their performance. In this talk I will review some recent theoretical works including the Hierarchical Optimistic Optimization (HOO) algorithm and other related hierarchical optimization algorithms (SOO). I will emphasize the specific assumptions about the function smoothness and the characterization of the problem complexity.
Andreas Krause. Information-theoretic Regret Bounds for (Contextual) Gaussian Process Bandit Optimization
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. We also present results for the contextual bandit setting, where in each round, additional information is provided to the decision maker and must be taken into account. We empirically evaluate the approach on sensor selection and automated vaccine design problems.
This is joint work with Niranjan Srinivas, Sham Kakade, Matthias Seeger and Cheng Soon Ong.
Louis Dorard: Gaussian Process Bandits applied to Tree Search
In bandit problems with many arms for which feature representations are given, we seek to learn a mapping from arm feature vectors to mean reward values. The LinRel algorithm, for instance, looks for linear mappings, and has also been extended to perform kernel Ridge Regression. Alternatively, Gaussian Processes can be used to model a prior belief on the smoothness of the mean-reward function f: it is likely that playing similar arms will yield similar rewards. The setting resembles that of GP noisy optimization, where f is the target and the reward variability is seen as noise.
In this talk, I will first review the GP-UCB algorithm that makes use of the principle of Optimism in the Face of Uncertainty in order to choose arms to play (i.e. samples to acquire, in the optimization setting). I will focus on the case where the number of arms is finite. We will then see how GPs can be used to model the smoothness of functions defined on tree paths, with covariances between paths based on the number of nodes in common, and we will see how a tree search problem can be approached with a single bandit algorithm for which tree paths are arms. The resulting algorithm, GPTS, can be made tractable whatever the size of the branching factor of the tree (which, with the maximum depth, determines the number of arms in the bandit problem). Finally, I will discuss the advantages and shortcomings of GPTS in comparison with "many-bandits" approaches to tree search, such as UCT: GPTS better deals with large branching factors, but has a higher computational cost.