Withdraw
Loading…
Faster apprenticeship learning through inverse optimal control
Zaytsev, Andrey
Loading…
Permalink
https://hdl.handle.net/2142/99228
Description
- Title
- Faster apprenticeship learning through inverse optimal control
- Author(s)
- Zaytsev, Andrey
- Issue Date
- 2017-12-05
- Director of Research (if dissertation) or Advisor (if thesis)
- Peng, Jian
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Apprenticeship learning
- Inverse reinforcement learning
- Inverse optimal control
- Deep learning
- Reinforcement learning
- Machine learning
- Abstract
- One of the fundamental problems of artificial intelligence is learning how to behave optimally. With applications ranging from self-driving cars to medical devices, this task is vital to modern society. There are two complementary problems in this area – reinforcement learning and inverse reinforcement learning. While reinforcement learning tries to find an optimal strategy in a given environment with known rewards for each action, inverse reinforcement learning or inverse optimal control seeks to recover rewards associated with actions given the environment and an optimal policy. Typically, apprenticeship learning is approached as a combination of these two techniques. This is an iterative process – at each step inverse reinforcement learning is applied first to get the rewards, followed by reinforcement learning to produce a guess for an optimal policy. Each guess is used in the further iterations to come up with a more accurate estimate of the reward function. While this works for problems with a small number of discreet states, the approach scales poorly. In order to mitigate those limitations, this research proposes a robust approach based on recent advances in the field of deep learning. Using the matrix formulation of inverse reinforcement learning, a reward function and an optimal policy can be recovered without having to iteratively optimize both. The approach scales well for problems with very large and continuous state spaces such as autonomous vehicle navigation. An evaluation performed using OpenAI RLLab suggests that this method is robust and ready to be adopted for solving problems both in research and various industries.
- Graduation Semester
- 2017-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/99228
- Copyright and License Information
- Copyright 2017 Andrey Zaytsev
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…