Withdraw
Loading…
Statistical reinforcement learning for individualized decision making
Zhou, Wenzhuo
Loading…
Permalink
https://hdl.handle.net/2142/116243
Description
- Title
- Statistical reinforcement learning for individualized decision making
- Author(s)
- Zhou, Wenzhuo
- Issue Date
- 2022-07-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhu, Ruoqing
- Doctoral Committee Chair(s)
- Zhu, Ruoqing
- Committee Member(s)
- Qu, Annie
- Shao, Xiaofeng
- Li, Xinran
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Markov Decision Process
- Reinforcement Learning
- Dynamic Treatment Regimes
- Survival Analysis
- Dimension Reduction
- Abstract
- Learning personalized patterns from highly heterogeneous data for decision making is one of the most challenging tasks in modern machine learning. However, it is important and necessary for the successes of many real-world applications, such as personalized medicine, mobile health, robotics, ride-sharing, etc. In general, this decision-making procedure can be formally characterized by reinforcement learning, or say dynamic treatment regime if it is associated with healthcare domain. In this thesis, we address several major challenges for individualized decision making in reinforcement learning and dynamic treatment regimes. On the perspective of the number of decision stages, we tackle the problems over finite and infinite length of horizon settings. On the action space point of view, we propose the methods which can be applicable for discrete cases, e.g., binary, multi-category and also for continuous cases. In the first part of the thesis, we focus on developing a novel personalized decision making procedure for dose assignment problems. Learning an individualized dose rule in personalized medicine is a challenging statistical problem. Existing methods often suffer from the curse of dimensionality, especially when the decision function is estimated nonparametrically. To tackle this problem, we propose a dimension reduction framework that effectively reduces the estimation to a lower-dimensional subspace of the covariates. We exploit that the individualized dose rule can be defined in a subspace spanned by a few linear combinations of the covariates, leading to a more parsimonious model. The proposed framework does not require the inverse probability of the propensity score under observational studies due to a direct maximization of the value function. Under the same framework, we further propose a pseudo-direct learning approach that focuses more on estimating the dimensionality-reduced subspace of the treatment outcome. Parameters in both approaches can be estimated efficiently using an orthogonality constrained optimization algorithm on the Stiefel manifold. Under mild regularity assumptions, the results on the asymptotic normality and consistency of the proposed estimators are established, respectively. In the second part, we develop a novel angle-based approach to search the optimal dynamic treatment regime (DTR) under a multicategory treatment framework for survival data. The proposed method targets to maximize the conditional survival function of patients following a DTR. Specifically, the proposed method obtains the optimal DTR via integrating estimations of decision rules at multiple stages into a single multicategory classification algorithm without imposing additional constraints, which is also more computationally efficient and robust. In theory, we establish Fisher consistency and provide the risk bound for the proposed estimator under regularity conditions. In the third component of the thesis, we tackle some fundamental challenges for reinforcement learning and mobile health. The practical use of mobile health (mHealth) technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. Many mHealth applications involve decision-making with large numbers of intervention options and under an infinite time horizon setting where the number of decision stages diverges to infinity. In addition, temporary medication shortages may cause optimal treatments to be unavailable, while it is unclear what alternatives can be used. To address these challenges, we propose a proximal temporal consistency learning framework to estimate an optimal regime that is adaptively adjusted between deterministic and stochastic sparse policy models. The resulting minimax estimator avoids the double sampling issue in the existing algorithms. It can be further simplified and can easily incorporate off-policy data without mismatched distribution corrections. We study theoretical properties of the sparse policy and establish finite-sample bounds on the excess risk and performance error.
- Graduation Semester
- 2022-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Wenzhuo Zhou
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…