Withdraw
Loading…
Value function approximation architectures for neuro-dynamic programming
Chen, Wei
Loading…
Permalink
https://hdl.handle.net/2142/46909
Description
- Title
- Value function approximation architectures for neuro-dynamic programming
- Author(s)
- Chen, Wei
- Issue Date
- 2014-01-16T18:26:02Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Meyn, Sean P.
- Doctoral Committee Chair(s)
- Meyn, Sean P.
- Committee Member(s)
- Hajek, Bruce
- Hutchinson, Seth A.
- Nedich, Angelia
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Neuro-Dynamic Programming
- Parametric Q-learning
- Value Function Approximation
- Processor Power Management
- Data Center Power Management
- Cross-Layer Wireless Control
- Abstract
- Neuro-dynamic programming is a class of powerful techniques for approximating the solution to dynamic programming equations. In their most computationally attractive formulations, these techniques provide the approximate solution only within a prescribed finite-dimensional function class. Thus, the question that always arises is how should the function class be chosen? In this dissertation, we first propose an approach using the solutions to associated fluid and diffusion approximations. In order to evaluate this approach, we establish bounds on the approximation errors. Next, we propose a novel parameterized Q-learning algorithm. Q-learning is a model-free method to compute the Q-function associated with an optimal policy, based on observations of states and actions. If the size of a state or a policy space is too large, Q-learning is often not very practical because there are too many Q-function values to update. One way to address this problem is to approximate the Q-function within a function class. However, such methods often require an explicit model of the system, such as the split sampling method introduced by Borkar. The proposed algorithm is a reinforcement learning (RL) method, in which case the system dynamics are not known. This method is designed based on using approximations of the transition kernel of the Markov decision process (MDP). Lastly, we apply the proposed results of value function approximation techniques to several applications. In the power management model, we focus on the processor speed control problem to balance the performance and energy usage. Then we extend the results to the load balancing and the power management problem of geographically distributed data centers with grid regulation. In the cross-layer wireless control problem, the network utility maximization (NUM) and adaptive modulation (AM) are combined to balance the network performance and transmission power. In these applications, we show how to model the real problems by using the MDP model with reasonable assumptions and necessary approximations. Approximations of the value function are obtained for specific models, and evaluated by getting bounds for the errors. These approximate solutions are then used to construct basis functions for learning algorithms in the simulations.
- Graduation Semester
- 2013-12
- Permalink
- http://hdl.handle.net/2142/46909
- Copyright and License Information
- Copyright 2013 Wei Chen
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…