Batch value function tournament for offline policy selection in reinforcement learning
Zhang, Siyuan
Loading…
Permalink
https://hdl.handle.net/2142/110602
Description
Title
Batch value function tournament for offline policy selection in reinforcement learning
Author(s)
Zhang, Siyuan
Issue Date
2021-04-29
Director of Research (if dissertation) or Advisor (if thesis)
Jiang, Nan
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Model Selection
Reinforcement Learning, Batch, Offline, RL
Abstract
Offline policy selection is a challenging open problem in reinforcement learning that has many important applications. The recently proposed Batch Value Function Tournament (BVFT) algorithm for batch learning offers some nice properties and can be applied to the model selection problem. In this thesis, we propose several changes to the original algorithm for adaptation to the task of offline model selection. We comprehensively experimented with the BVFT algorithm for policy selection across the various domains to evaluate and analyze the performance of BVFT. We show that BVFT achieves good performance in comparison with a number of state-of-the-art approaches. We demonstrate that BVFT is a reliable option to the problem of policy selection in offline reinforcement learning.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.