Value function approximation architectures for neuro-dynamic programming

Chen, Wei

Value function approximation architectures for neuro-dynamic programming

Chen, Wei

Content Files

Wei_Chen.pdf

Permalink

https://hdl.handle.net/2142/46909

Description

Title

Value function approximation architectures for neuro-dynamic programming

Author(s)

Chen, Wei

Issue Date

2014-01-16T18:26:02Z

Director of Research (if dissertation) or Advisor (if thesis)

Meyn, Sean P.

Doctoral Committee Chair(s)

Meyn, Sean P.

Committee Member(s)

Hajek, Bruce
Hutchinson, Seth A.
Nedich, Angelia

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2014-01-16T18:26:02Z

Keyword(s)

Neuro-Dynamic Programming
Parametric Q-learning
Value Function Approximation
Processor Power Management
Data Center Power Management
Cross-Layer Wireless Control

Abstract

Neuro-dynamic programming is a class of powerful techniques for approximating the solution to dynamic programming equations. In their most computationally attractive formulations, these techniques provide the approximate solution only within a prescribed finite-dimensional function class. Thus, the question that always arises is how should the function class be chosen? In this dissertation, we first propose an approach using the solutions to associated fluid and diffusion approximations. In order to evaluate this approach, we establish bounds on the approximation errors. Next, we propose a novel parameterized Q-learning algorithm. Q-learning is a model-free method to compute the Q-function associated with an optimal policy, based on observations of states and actions. If the size of a state or a policy space is too large, Q-learning is often not very practical because there are too many Q-function values to update. One way to address this problem is to approximate the Q-function within a function class. However, such methods often require an explicit model of the system, such as the split sampling method introduced by Borkar. The proposed algorithm is a reinforcement learning (RL) method, in which case the system dynamics are not known. This method is designed based on using approximations of the transition kernel of the Markov decision process (MDP). Lastly, we apply the proposed results of value function approximation techniques to several applications. In the power management model, we focus on the processor speed control problem to balance the performance and energy usage. Then we extend the results to the load balancing and the power management problem of geographically distributed data centers with grid regulation. In the cross-layer wireless control problem, the network utility maximization (NUM) and adaptive modulation (AM) are combined to balance the network performance and transmission power. In these applications, we show how to model the real problems by using the MDP model with reasonable assumptions and necessary approximations. Approximations of the value function are obtained for specific models, and evaluated by getting bounds for the errors. These approximate solutions are then used to construct basis functions for learning algorithms in the simulations.

Graduation Semester

2013-12

Permalink

http://hdl.handle.net/2142/46909

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Value function approximation architectures for neuro-dynamic programming

Chen, Wei

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In