Withdraw
Loading…
Reinforcement learning with supervision beyond environmental rewards
Gangwani, Tanmay
Loading…
Permalink
https://hdl.handle.net/2142/113914
Description
- Title
- Reinforcement learning with supervision beyond environmental rewards
- Author(s)
- Gangwani, Tanmay
- Issue Date
- 2021-12-03
- Director of Research (if dissertation) or Advisor (if thesis)
- Peng, Jian
- Doctoral Committee Chair(s)
- Peng, Jian
- Committee Member(s)
- Forsyth, David
- Zhai, ChengXiang
- Gupta, Saurabh
- Hofmann, Katja
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reinforcement learning
- Imitation learning
- Abstract
- Reinforcement Learning (RL) is an elegant approach to tackle sequential decision-making problems. In the standard setting, the task designer curates a reward function and the RL agent's objective is to take actions in the environment such that the long-term cumulative reward is maximized. Deep RL algorithms---that combine RL principles with deep neural networks---have been successfully used to learn behaviors in complex environments but are generally quite sensitive to the nature of the reward function. For a given RL problem, the environmental rewards could be sparse, delayed, misspecified, or unavailable (i.e., impossible to define mathematically for the required behavior). These scenarios exacerbate the challenge of training a stable deep-RL agent in a sample-efficient manner. In this thesis, we study methods that go beyond a direct reliance on the environmental rewards by generating additional information signals that the RL agent could incorporate for learning the desired skills. We start by investigating the performance bottlenecks in delayed reward environments and propose to address these by learning surrogate rewards. We include two methods to compute the surrogate rewards using the agent-environment interaction data. Then, we consider the imitation-learning (IL) setting where we don't have access to any rewards, but instead, are provided with a dataset of expert demonstrations that the RL agent must learn to reliably reproduce. We propose IL algorithms for partially observable environments and situations with discrepancies between the transition dynamics of the expert and the imitator. Next, we consider the benefits of learning an ensemble of RL agents with explicit diversity pressure. We show that diversity encourages exploration and facilitates the discovery of sparse environmental rewards. Finally, we analyze the concept of sharing knowledge between RL agents operating in different but related environments and show that the information transfer can accelerate learning.
- Graduation Semester
- 2021-12
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/113914
- Copyright and License Information
- Copyright 2021 Tanmay Gangwani
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…