Withdraw
Loading…
Multi-agent reinforcement learning for nonzero-sum Markov games
Mao, Weichao
Loading…
Permalink
https://hdl.handle.net/2142/124295
Description
- Title
- Multi-agent reinforcement learning for nonzero-sum Markov games
- Author(s)
- Mao, Weichao
- Issue Date
- 2024-04-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Başar, Tamer
- Doctoral Committee Chair(s)
- Başar, Tamer
- Committee Member(s)
- Iyer, Ravishankar K.
- Srikant, Rayadurgam
- Raginsky, Maxim
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reinforcement learning
- Game theory
- Abstract
- In recent years, multi-agent reinforcement learning (MARL) has shown remarkable capabilities in addressing sequential decision-making problems that involve the strategic interactions of more than one decision-maker. Motivated by the empirical successes, many research efforts have been devoted to lay the theoretical foundations of MARL. In this dissertation, we contribute to this line of theoretical research by developing MARL algorithms with convergence and sample complexity guarantees in nonzero-sum Markov games, a regime that has been barely touched on in prior research. First, we design sample-efficient MARL algorithms for learning (coarse) correlated equilibria in general-sum Markov games. Our algorithms integrate variants of optimistic Q-learning for efficient exploration with uncoupled no-regret learning for policy updates. These algorithms are decentralized in the sense that each agent makes decisions based on only its local information with no need of communication or central coordination. We theoretically establish the sample complexity guarantees for our algorithms, which appear to be the first for decentralized MARL in general-sum Markov games. Second, we study reinforcement learning (RL) under environmental non-stationarity, a major challenge faced by MARL agents. When both the reward functions and the state transition distributions may vary over time, we propose a simple but effective restart-based algorithm particularly tailored to such non-stationary environments. We analyze the dynamic regret of our algorithm and show that it is near-optimal by establishing an almost matching information-theoretical lower bound. We demonstrate that our non-stationary RL method can be readily applied to learning the team-optimal policies in a specific category of cooperative games with slowly-changing opponents. Third, we propose to use meta-learning to transfer useful information across multiple MARL tasks so as to learn related tasks collectively and more efficiently. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. Numerical results are provided to corroborate our theoretical findings. Finally, we conclude this dissertation and discuss future research directions.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Weichao Mao
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…