Faster apprenticeship learning through inverse optimal control

Zaytsev, Andrey

Faster apprenticeship learning through inverse optimal control

Zaytsev, Andrey

Permalink

https://hdl.handle.net/2142/99228

Description

Title

Faster apprenticeship learning through inverse optimal control

Author(s)

Zaytsev, Andrey

Issue Date

2017-12-05

Director of Research (if dissertation) or Advisor (if thesis)

Peng, Jian

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Apprenticeship learning
Inverse reinforcement learning
Inverse optimal control
Deep learning
Reinforcement learning
Machine learning

Abstract

One of the fundamental problems of artificial intelligence is learning how to behave optimally. With applications ranging from self-driving cars to medical devices, this task is vital to modern society. There are two complementary problems in this area – reinforcement learning and inverse reinforcement learning. While reinforcement learning tries to find an optimal strategy in a given environment with known rewards for each action, inverse reinforcement learning or inverse optimal control seeks to recover rewards associated with actions given the environment and an optimal policy. Typically, apprenticeship learning is approached as a combination of these two techniques. This is an iterative process – at each step inverse reinforcement learning is applied first to get the rewards, followed by reinforcement learning to produce a guess for an optimal policy. Each guess is used in the further iterations to come up with a more accurate estimate of the reward function. While this works for problems with a small number of discreet states, the approach scales poorly. In order to mitigate those limitations, this research proposes a robust approach based on recent advances in the field of deep learning. Using the matrix formulation of inverse reinforcement learning, a reward function and an optimal policy can be recovered without having to iteratively optimize both. The approach scales well for problems with very large and continuous state spaces such as autonomous vehicle navigation. An evaluation performed using OpenAI RLLab suggests that this method is robust and ready to be adopted for solving problems both in research and various industries.

Graduation Semester

2017-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/99228

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Faster apprenticeship learning through inverse optimal control

Zaytsev, Andrey

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In