First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Open Access
Authors
  • Chen-Yu Wei
Publication date 2023
Host editors
  • A. Oh
  • T. Naumann
  • A. Globerson
  • K. Saenko
  • M. Hardt
  • S. Levine
Book title 37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Book subtitle 10-16 December 2023, New Orleans, Louisana, USA
ISBN (electronic)
  • 9781713899921
Series Advances in Neural Information Processing Systems
Event 37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Pages (from-to) 61625-61644
Number of pages 20
Publisher Neural Information Processing Systems Foundation
Organisations
  • Faculty of Science (FNWI) - Korteweg-de Vries Institute for Mathematics (KdVI)
Abstract
We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of K arms to changeover time without restriction. Assuming the d-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of T rounds is known to scale as Õ(√KdT). Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order \tildeO(K√dVT) in terms of the cumulative second moment of the learner's losses VT, and a closely related first-order bound of order Õ(K√dLT) in terms of the cumulative loss of the best policy LT. Since VT or LT may be significantly smaller than T, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts.
Document type Conference contribution
Language English
Published at https://papers.nips.cc/paper_files/paper/2023/hash/c2201e444d2b22a10ca50116a522b9a9-Abstract-Conference.html
Other links https://doi.org/10.52202/075280
Downloads
Permalink to this page
Back