Publications

How to Guide a Non-Cooperative Learner to Cooperate: Exploiting No-Regret Algorithms in System Design

Published in Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 2021

We investigate a repeated two-player game setting where the column player is also a designer of the system, and has full control over payoff matrices. In addition, we assume that the row player uses a no-regret algorithm to efficiently learn how to adapt their strategy to the column player’s behaviour over time. The goal of the column player is to guide her opponent into picking a mixed strategy which is preferred by the system designer. Therefore, she needs to: (i) design appropriate payoffs for both players; and (ii) strategically interact with the row player during a sequence of plays in order to guide her opponent to converge to the desired mixed strategy.

Recommended citation: Nicholas Bishop, Le Cong Dinh, Long Tran-Thanh. "How to Guide a Non-Cooperative Learner to Cooperate: Exploiting No-Regret Algorithms in System Design". In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 2021

Optimal Learning from Verified Training Data

Published in Advances in Neural Information Processing Systems 33, 2020

Standard machine learning algorithms typically assume that data is sampled independently from the distribution of interest. In attempts to relax this assumption, fields such as adversarial learning typically assume that data is provided by an adversary, whose sole objective is to fool a learning algorithm. However, in reality, it is often the case that data comes from self-interested agents, with less malicious goals and intentions which lie somewhere between the two settings described above. To tackle this problem, we present a Stackelberg competition model for least squares regression, in which data is provided by agents who wish to achieve specific predictions for their data.

Recommended citation: Nicholas Bishop, Long Tran-Thanh, Enrico Gerding. "Optimal Learning from Verified Training Data". In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

Adversarial Blocking Bandits

Published in Advances in Neural Information Processing Systems 33, 2020

We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e.g., arms) where the supplies replenish and can be reused only after certain time periods.

Recommended citation: Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh. "Adversarial Blocking Bandits". In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)