Withdraw
Loading…
Statistical problems with deterministic reinforcement learning and small sample biases
Saleh, Ehsan
Loading…
Permalink
https://hdl.handle.net/2142/124270
Description
- Title
- Statistical problems with deterministic reinforcement learning and small sample biases
- Author(s)
- Saleh, Ehsan
- Issue Date
- 2024-04-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Bretl, Timothy
- West, Matthew
- Doctoral Committee Chair(s)
- Bretl, Timothy
- West, Matthew
- Committee Member(s)
- Forsyth, David
- Jiang, Nan
- Cheng, Ching-An
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reinforcement Learning
- Deterministic Search
- Truly Deterministic Policy Optimization
- Policy Gradient
- Model-Free Reinforcement Learning
- Robotics
- Scientific Machine Learning
- Delayed Target
- Physics-Informed Neural Networks
- Abstract
- This dissertation focuses on two main problems: (1) building a truly deterministic policy optimization method suitable for challenging robotic applications, and (2) addressing the small-sample biases associated with learning from integral losses in the context of scientific machine learning. These two topics are closely related to the policy gradient and the approximate dynamic programming techniques within reinforcement learning. The first part of the dissertation targets the practical difficulties of reinforcement learning regarding realistic robotic artifacts, such as the necessity to define non-local rewards, long decision-making horizons, and control systems with resonant frequencies. First, we derive a Wasserstein-based trust region lower bound of the payoff specifically for deterministic policy search purposes. This plays a key role in regulating the policy updates of our deterministic search method. Based on this, we then formally introduce our truly deterministic policy optimization method. The key feature of this method lies in its ability to avoid the need for exploratory noise injection. This allows our method to solve the aforementioned practical challenges safely and effectively. The last part of the dissertation focuses on the challenges involving integral loss functions while solving partial integro-differential equations with physics-informed neural networks. Such challenges are similar in nature to those encountered when solving the Bellman equation in reinforcement learning. In particular, we focus on the small sample biases resulting from naive approximations to estimate the integral loss functions. We explore three potential solutions to this problem including the deterministic and double-sampling tricks and the delayed target method. Finally, we provide three numerical problems to extensively evaluate these potential solutions. This work is mainly inspired by the existing literature in reinforcement learning and strives to provide a meaningful extension to address more practical challenges. In particular, our payoff lower bounds and the monotonic policy improvement strategies are essentially the deterministic analogs of the conservative policy iteration and trust region policy optimization methods. Furthermore, the problems of learning from partial integro-differential equations and temporal differences are essentially two sides of the same coin. By relying on well-established paradigms from reinforcement learning, our work could take a small step toward expanding the practical applications of reinforcement and scientific machine learning.
- Graduation Semester
- 2024-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2024 Ehsan Saleh
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…