Programmes
Evénements
Articles
Transition
Demographic transition
Environmental transition
Financial transition
Digital transition
We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset of a specified size. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the ``Chernoff information'' between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between strategies based on uniform and adaptive sampling for pure-exploration problems, finding evidence in favor of the latter.
Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for one-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (through the Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families.
Your name* Your email* Your message*
*required fields