Abstract
Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.
Abstract (translated)
多臂老虎机(MAB)和因果多臂老虎机(CMAB)是用于决策问题的框架。大部分先前的研究通常针对给定问题和相关数据分别研究并解决单个MAB和CMAB。然而,决策者通常面临多个相关问题和多尺度观察,需要联合形式化地表述问题结构和数据依赖关系。对于CMAB,迁移学习有助于解决模型定义在相同变量上的情况,尽管因果关系可能存在差异。在本文中,我们将扩展到涉及可能具有不同变量的CMAB的设置,通过一个抽象映射进行相关。正式地,我们引入了因果抽象MABs(CAMABs)的问题,通过依赖理论进行推理以表达严谨的抽象映射。我们提出了在CAMAB中学习的算法,并研究了它们的遗憾。我们通过一个与在线广告相关的真实世界场景,展示了我们算法的局限性和优势。
URL
https://arxiv.org/abs/2404.17493