Statistical problems with deterministic reinforcement learning and small sample biases

Saleh, Ehsan

Statistical problems with deterministic reinforcement learning and small sample biases

Saleh, Ehsan

Permalink

https://hdl.handle.net/2142/124270

Description

Title

Statistical problems with deterministic reinforcement learning and small sample biases

Author(s)

Saleh, Ehsan

Issue Date

2024-04-15

Director of Research (if dissertation) or Advisor (if thesis)

Bretl, Timothy
West, Matthew

Doctoral Committee Chair(s)

Bretl, Timothy
West, Matthew

Committee Member(s)

Forsyth, David
Jiang, Nan
Cheng, Ching-An

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Reinforcement Learning
Deterministic Search
Truly Deterministic Policy Optimization
Policy Gradient
Model-Free Reinforcement Learning
Robotics
Scientific Machine Learning
Delayed Target
Physics-Informed Neural Networks

Abstract

This dissertation focuses on two main problems: (1) building a truly deterministic policy optimization method suitable for challenging robotic applications, and (2) addressing the small-sample biases associated with learning from integral losses in the context of scientific machine learning. These two topics are closely related to the policy gradient and the approximate dynamic programming techniques within reinforcement learning. The first part of the dissertation targets the practical difficulties of reinforcement learning regarding realistic robotic artifacts, such as the necessity to define non-local rewards, long decision-making horizons, and control systems with resonant frequencies. First, we derive a Wasserstein-based trust region lower bound of the payoff specifically for deterministic policy search purposes. This plays a key role in regulating the policy updates of our deterministic search method. Based on this, we then formally introduce our truly deterministic policy optimization method. The key feature of this method lies in its ability to avoid the need for exploratory noise injection. This allows our method to solve the aforementioned practical challenges safely and effectively. The last part of the dissertation focuses on the challenges involving integral loss functions while solving partial integro-differential equations with physics-informed neural networks. Such challenges are similar in nature to those encountered when solving the Bellman equation in reinforcement learning. In particular, we focus on the small sample biases resulting from naive approximations to estimate the integral loss functions. We explore three potential solutions to this problem including the deterministic and double-sampling tricks and the delayed target method. Finally, we provide three numerical problems to extensively evaluate these potential solutions. This work is mainly inspired by the existing literature in reinforcement learning and strives to provide a meaningful extension to address more practical challenges. In particular, our payoff lower bounds and the monotonic policy improvement strategies are essentially the deterministic analogs of the conservative policy iteration and trust region policy optimization methods. Furthermore, the problems of learning from partial integro-differential equations and temporal differences are essentially two sides of the same coin. By relying on well-established paradigms from reinforcement learning, our work could take a small step toward expanding the practical applications of reinforcement and scientific machine learning.

Graduation Semester

2024-05

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/124270

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Statistical problems with deterministic reinforcement learning and small sample biases

Saleh, Ehsan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In