Improved worst-case regret bounds for randomized least-squares value iteration

Agrawal, Priyank

Improved worst-case regret bounds for randomized least-squares value iteration

Agrawal, Priyank

Content Files

AGRAWAL-THESIS-2021.pdf

Permalink

https://hdl.handle.net/2142/113048

Description

Title

Improved worst-case regret bounds for randomized least-squares value iteration

Author(s)

Agrawal, Priyank

Issue Date

2021-07-15

Director of Research (if dissertation) or Advisor (if thesis)

Jiang, Nan

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2022-01-12T21:45:44Z

Keyword(s)

Reinforcement Learning
Exploration-Exploitation

Abstract

This work studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our $\tilde{\mathrm{O}}(H^2S\sqrt{AT})$ high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.

Graduation Semester

2021-08

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/113048

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Improved worst-case regret bounds for randomized least-squares value iteration

Agrawal, Priyank

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In