Date: December 16th, 2011
Morning session: 7:30-10:30 am
- [7:30] Welcome and
- [7:50] Invited Talk 1: Remi
Munos. Hierarchical bandits for
- [8:20] Contributed Talk
1: Javad Azimi, Ali Jalali, and
Xiaoli Fern. Dynamic Batch
- [8:40] Contributed Talk
2: Alexandre Passos, Jurgen
Van Gael, Ralf Herbrich and Ulrich Paquet. A penny for your
the value of information in recommendation systems
- [9:00] Break
- [9:20] Spotlights of poster
- [9:50] Poster session 1
Afternoon session: 4:00-7:30 pm
- [4:00] Invited Talk 2:
Andreas Krause. Information-theoretic
Regret Bounds for (Contextual) Gaussian Process
- [4:30] Invited Talk 3: Csaba
- [5:00] Poster
- [5:30] Break
- [5:50] Invited Talk
4: Louis Dorard. Gaussian Process
Bandits applied to Tree Search
- [6:20] Panel discussion
Accepted papers (all to be
presented as posters)
(where received, we also provide the posters)
- Shipra Agrawal and Navin
of Thompson Sampling for
the multi-armed bandit problem [paper] [extended
Anagnostopoulos and Robert B. Gramacy.
trees for online
analysis of massive data [paper]
- Javad Azimi, Ali Jalali,
and Xiaoli Fern.
- James Bergstra.
of Algorithms for
Hyper-parameter Selection [paper]
Cesa-Bianchi and Sham Kakade.
Optimal Algorithm for Linear
- Nando de Freitas*, Alex
Smola, and Masrour Zoghi.
Bounds for GP Bandits
Observation Noise [paper]
- Katja Hofmann, Shimon
Whiteson, and Maarten de Rijke.
- Philipp Hennig and
Christian J. Schuler.
Optimization [paper] [extended
- Neil M. T. Houlsby, Ferenc
Ghahramani, and Máté Lengyel.
Active Learning for
Gaussian Process Classification [paper]
- Frank Hutter*, Holger Hoos,
and Kevin Leyton-Brown.
- M. Jala, C. Levy-Leduc and
E. Moulines, E. Conil and J. Wiart.
Design of Computer
Experiments for the Estimation of a Quantile with Application to
- Emilie Kaufmann, Olivier
Cappé, and Aurélien
the efficiency of
Bayesian bandit algorithms from a frequentist point of view [paper]
- Teodor Mihai Moldovan and
Safe Exploration in
Markov Decision Processes [paper]
- Alexandre Passos, Jurgen
Van Gael, Ralf Herbrich and Ulrich Paquet.
penny for your thoughts:
the value of information in recommendation systems [paper]
- Jasper Snoek, Hugo
Larochelle, and Ryan Prescott Adams.
Cost in Bayesian
- Matthew Tesch, Jeff
Schneider, and Howie Choset.
Adapting Control Policies for Expensive
Systems to Changing Environments [paper] [poster]
* Submissions by workshop organizers were entirely shielded from
organizer and reviewed double-blindly.
Abstracts for invited talks
- Remi Munos.
Hierarchical bandits for function optimization
The "optimism in the face of uncertainty" principle has been recently
applied to complex sequential decision making and planning problems,
such as the game of go. The idea is to use bandit algorithms to
efficiently explore the set of possible strategies given noisy
evaluations of their performance. In this talk I will review some
recent theoretical works including the Hierarchical Optimistic
Optimization (HOO) algorithm and other related hierarchical
optimization algorithms (SOO). I will emphasize the specific
assumptions about the function smoothness and the characterization of
the problem complexity.
- Andreas Krause. Information-theoretic
Regret Bounds for (Contextual) Gaussian Process Bandit
Many applications require optimizing an unknown, noisy function that
is expensive to evaluate. We formalize this task as a multi-armed
bandit problem, where the payoff function is either sampled from a
Gaussian process (GP) or has low RKHS norm. We resolve the important
open problem of deriving regret bounds for this setting, which imply
novel convergence rates for GP optimization. We analyze GP-UCB, an
intuitive upper-confidence based algorithm, and bound its cumulative
regret in terms of maximal information gain, establishing a novel
connection between GP optimization and experimental design. Moreover,
by bounding the latter in terms of operator spectra, we obtain
explicit sublinear regret bounds for many commonly used covariance
functions. In some important cases, our bounds have surprisingly weak
dependence on the dimensionality. We also present results for the
contextual bandit setting, where in each round, additional information
is provided to the decision maker and must be taken into account. We
empirically evaluate the approach on sensor selection and automated
vaccine design problems.
This is joint work with Niranjan Srinivas, Sham Kakade, Matthias
Seeger and Cheng Soon Ong.
- Louis Dorard:
Gaussian Process Bandits applied to Tree Search
In bandit problems with many arms for which feature representations are
given, we seek to learn a mapping from arm feature vectors to mean
reward values. The LinRel algorithm, for instance, looks for linear
mappings, and has also been extended to perform kernel Ridge
Regression. Alternatively, Gaussian Processes can be used to model a
prior belief on the smoothness of the mean-reward function f: it is
likely that playing similar arms will yield similar rewards. The
setting resembles that of GP noisy optimization, where f is the target
and the reward variability is seen as noise.
In this talk, I will first review the GP-UCB algorithm that makes use
of the principle of Optimism in the Face of Uncertainty in order to
choose arms to play (i.e. samples to acquire, in the optimization
setting). I will focus on the case where the number of arms is finite.
We will then see how GPs can be used to model the smoothness of
functions defined on tree paths, with covariances between paths based
on the number of nodes in common, and we will see how a tree search
problem can be approached with a single bandit algorithm for which tree
paths are arms. The resulting algorithm, GPTS, can be made tractable
whatever the size of the branching factor of the tree (which, with the
maximum depth, determines the number of arms in the bandit problem).
Finally, I will discuss the advantages and shortcomings of GPTS in
comparison with "many-bandits" approaches to tree search, such as UCT:
GPTS better deals with large branching factors, but has a higher