Withdraw
Loading…
Optimization for stochastic, partially observed systems using a sampling-based approach to learn switched policies
Candido, Salvatore J.
Loading…
Permalink
https://hdl.handle.net/2142/26081
Description
- Title
- Optimization for stochastic, partially observed systems using a sampling-based approach to learn switched policies
- Author(s)
- Candido, Salvatore J.
- Issue Date
- 2011-08-25T22:12:16Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Hutchinson, Seth A.
- Doctoral Committee Chair(s)
- Hutchinson, Seth A.
- Committee Member(s)
- Meyn, Sean P.
- Coleman, Todd P.
- Zhou, Enlu
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Partially Observable Markov Decision Processes (POMDP)
- robotics
- machine learning
- motion planning
- stochastic control
- Abstract
- We propose a new method for learning policies for large, partially observable Markov decision processes (POMDPs) that require long time horizons for planning. Computing optimal policies for POMDPs is an intractable problem and, in practice, dimensionality renders exact solutions essentially unreachable for even small real-world systems of interest. For this reason, we restrict the policies we learn to the class of switched belief-feedback policies in a manner that allows us to introduce domain expert knowledge into the planning process. This approach has worked well for the systems on which we have tested our approach, and we conjecture that it will be useful for many real-world systems of interest. Our approach is based on a method like value iteration to learn a switching law. Because the POMDP problem is intractable, we use a Monte Carlo approximation to evaluate system behavior and optimize a switching law based on sampling. We explicitly analyze the sensitivity of expected cost (performance) with respect to perturbations introduced by our approximations, and use that analysis to avoid approximation errors that are potentially disastrous when using the computed policy. We demonstrate results on discrete POMDP problems from the literature and a resource allocation problem modeled after a team of robots attempting to extinguish a forest fire. We then utilize our approach to build two algorithms that solve the minimum uncertainty robot navigation problem. We demonstrate that our approach can improve on techniques in the literature in terms of solution quality by demonstrating our results in simulation. Our second approach utilizes information-theoretic heuristics to drive the sampling-based learning procedure. We conjecture that this is a useful direction towards an efficient, general stochastic motion planning algorithm.
- Graduation Semester
- 2011-08
- Permalink
- http://hdl.handle.net/2142/26081
- Copyright and License Information
- Copyright 2011 Salvatore J. Candido
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…