Reinforcement learning with supervision beyond environmental rewards

Gangwani, Tanmay

Reinforcement learning with supervision beyond environmental rewards

Gangwani, Tanmay

Permalink

https://hdl.handle.net/2142/113914

Description

Title

Reinforcement learning with supervision beyond environmental rewards

Author(s)

Gangwani, Tanmay

Issue Date

2021-12-03

Director of Research (if dissertation) or Advisor (if thesis)

Peng, Jian

Doctoral Committee Chair(s)

Peng, Jian

Committee Member(s)

Forsyth, David
Zhai, ChengXiang
Gupta, Saurabh
Hofmann, Katja

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Reinforcement learning
Imitation learning

Abstract

Reinforcement Learning (RL) is an elegant approach to tackle sequential decision-making problems. In the standard setting, the task designer curates a reward function and the RL agent's objective is to take actions in the environment such that the long-term cumulative reward is maximized. Deep RL algorithms---that combine RL principles with deep neural networks---have been successfully used to learn behaviors in complex environments but are generally quite sensitive to the nature of the reward function. For a given RL problem, the environmental rewards could be sparse, delayed, misspecified, or unavailable (i.e., impossible to define mathematically for the required behavior). These scenarios exacerbate the challenge of training a stable deep-RL agent in a sample-efficient manner. In this thesis, we study methods that go beyond a direct reliance on the environmental rewards by generating additional information signals that the RL agent could incorporate for learning the desired skills. We start by investigating the performance bottlenecks in delayed reward environments and propose to address these by learning surrogate rewards. We include two methods to compute the surrogate rewards using the agent-environment interaction data. Then, we consider the imitation-learning (IL) setting where we don't have access to any rewards, but instead, are provided with a dataset of expert demonstrations that the RL agent must learn to reliably reproduce. We propose IL algorithms for partially observable environments and situations with discrepancies between the transition dynamics of the expert and the imitator. Next, we consider the benefits of learning an ensemble of RL agents with explicit diversity pressure. We show that diversity encourages exploration and facilitates the discovery of sparse environmental rewards. Finally, we analyze the concept of sharing knowledge between RL agents operating in different but related environments and show that the information transfer can accelerate learning.

Graduation Semester

2021-12

Type of Resource

Thesis

Permalink

http://hdl.handle.net/2142/113914

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Reinforcement learning with supervision beyond environmental rewards

Gangwani, Tanmay

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In