MaLGa logoMaLGa black extendedMaLGa white extendedUniGe ¦ MaLGaUniGe ¦ MaLGaUniversita di Genova | MaLGaUniversita di Genova
Seminar

Non-Stationary Delayed Bandits with Intermediate Observations

18/10/2021

Title

Non-Stationary Delayed Bandits with Intermediate Observations


Speaker

Claire Vernade - DeepMind (UK)


Abstract

We consider the problem of learning with delayed bandit feedback, meaning by trial and error, in changing environments. This problem is ubiquitous in many online recommender systems that aim at showing content, which is ultimately evaluated by long-term metrics like a purchase, or a watching time. Mitigating the effects of delays in stationary environments is well-understood, but the problem becomes much more challenging when the environment changes. In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete. However, the arising issues can be addressed if relevant intermediate signals are available without delay, such that given those signals, the long-term behavior of the system is stationary. To model this situation, we introduce the problem of stochastic, non-stationary and delayed bandits with intermediate observations. We develop a computationally efficient algorithm based on UCRL, and prove sublinear regret guarantees for its performance.


Bio

Claire is a Research Scientist at DeepMind in London UK. She received her PhD from Telecom ParisTech in October 2017, under the guidance of Prof. Olivier Cappé. From January 2018-October 2018, she worked part-time as an Applied Scientist at Amazon in Berlin, while doing a post-doc with Alexandra Carpentier at the University of Magdeburg in Germany. Her research is on sequential decision making. It mostly spans bandit problems, but Claire's interest also extends to Reinforcement Learning and Learning Theory. While keeping in mind concrete problems -- often inspired by interactions with product teams -- she focuses on theoretical approaches, aiming for provably optimal algorithms. She recently received an Outstanding Paper Award at ICLR for a joint work on a game-theoretic approach to PCA.


When

2021-10-18 at 3:00 pm


Where

Remote, @UniGE