Withdraw
Loading…
Random survival forests: quantifying uncertainties and other extensions
Formentini, Sarah Elizabeth
Loading…
Permalink
https://hdl.handle.net/2142/116227
Description
- Title
- Random survival forests: quantifying uncertainties and other extensions
- Author(s)
- Formentini, Sarah Elizabeth
- Issue Date
- 2022-07-13
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhu, Ruoqing
- Doctoral Committee Chair(s)
- Zhu, Ruoqing
- Committee Member(s)
- Chen, Yuguo
- Shao, Xiaofeng
- Zhao, Sihai D
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Survival random forests
- random forests
- U-statistics
- variable importance
- Abstract
- One of biomedical studies' most commonly encountered problems is analyzing censored survival data. Survival analysis differs from standard regression problems by one central feature: the event of interest may not be fully observed. Therefore, we must adapt the statistical methods used to analyze this data to handle the missing information. In the first chapter, we briefly introduce right-censored survival data and introduce survival random forest models for analyzing them. In addition to the statistical formulation, we provide details of tuning parameters commonly considered in practice. In chapter 2, this thesis proposes a method for statistical inference on cumulative hazard predictions by extending recent developments in infinite-order incomplete U-statistics. Before our work, there was no methodology for calculating a confidence band for a survival random forest prediction. We introduce numerical methods for estimating a cumulative hazard prediction over the observed failure times and a covariance matrix of the predictions at each failure time. Then, using the covariance matrix and assuming a Gaussian distribution, we find a critical value to use for building a confidence band around the prediction. We show that the confidence bands contain the average random forest prediction at least 95% of the time in simulations and give an example with an actual data set. In chapter 3, we introduce a method for statistical inference on variable importance estimates from a survival random forest. Previous work on this topic was primarily focused on regression random forests, with some work on survival random forests. We outline variable importance estimation and the associated variance estimation using similar concepts to the survival predictions. We then use those estimates to build a confidence interval. We show through simulations that these intervals cover the average random forest variable importance at least 93% which improves over the competing method at 84%. In chapter 4, we propose new random survival forests that utilize information from existing studies to improve the model fitting. Random survival forests are popular statistical models in biomedical studies, especially for cancer studies with high-dimensional genetic information. With the abundance of cancer genetics and genomics data, new studies can borrow information from existing ones. This incorporation is achieved by constructing a new type of splitting rule that penalizes the marginal scores of a potential split so that variables with strong existing known association with the outcome are encouraged to be selected. We experimented with this penalized random survival forest by utilizing two types of prior information: the marginal p-value, which is often released from existing studies, and the variable importance measure calculated from the existing data if the complete data are available. We perform simulation studies to demonstrate the performance over existing single data set approaches and apply our method to the TCGA GBM and LGG data to discover brain tumor biomarkers.
- Graduation Semester
- 2022-08
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2022 Sarah Formentini
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…