Withdraw
Loading…
Recognizing cardiovascular disease patterns with machine learning using NHANES accelerometer determined physical activity data
Boiarskaia, Elena
Loading…
Permalink
https://hdl.handle.net/2142/92805
Description
- Title
- Recognizing cardiovascular disease patterns with machine learning using NHANES accelerometer determined physical activity data
- Author(s)
- Boiarskaia, Elena
- Issue Date
- 2016-07-12
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhu, Weimo
- Doctoral Committee Chair(s)
- Zhu, Weimo
- Committee Member(s)
- Buchner, David
- Liang, Feng
- Wilund, Kenneth
- Department of Study
- Kinesiology & Community Health
- Discipline
- Kinesiology
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Machine learning
- accelerometers
- physical activity recommendations
- cardiovascular disease risk
- Reynolds risk score
- classification algorithms
- feature selection
- random forest
- decision tree
- support vector machine
- lasso regression
- neural network
- NHANES
- Abstract
- The relationship between physical activity (PA) and cardiovascular disease (CVD) is well established; however, questions about the appropriate dose of PA to reduce CVD risk still remain (Blair, LaMonte, & Nichaman, 2004; Pate et al., 1995). The optimal dose and the effects of intensity, duration, and frequency of PA are not fully understood (Haskell et al., 2007). This study connects objectively measured PA with a cross-sectional measure of CVD risk for an in-depth analysis of PA patterns that contribute to higher risk of CVD. Specifically, this study applied machine learning algorithms to NHANES accelerometer data from the 2003-2006 cohorts with the Reynolds cardiovascular risk score as the outcome. Using accelerometer data as a proxy for the Reynold's risk score to study cardiovascular disease risk allows the use of cross-sectional data when the longitudinal outcome is not known. A major benefit of using accelerometers to objectively measure of PA is that the data is easy and inexpensive to obtain. Furthermore, most locomotive activities are measured with a high degree of accuracy. Accelerometers can gather highly detailed information about an individual’s PA pattern over extended periods of time. This produces a large amount of data that requires specialized techniques to analyze. The analysis for this study was conducted using a variety of machine learning techniques to identify individual patterns in the data and evaluate what contributes most to high CVD risk. Comparison of machine learning algorithms shows that all classifiers perform well when given appropriate features. Using predefined intensity thresholds to compute average time spent in a PA category yielded good classification results in identifying study participants at high and low risk for CVD (Troiano et al., 2008). Adding PA pattern-related features to the model did not appear to improve classification. Features derived using k-means and the Hidden Markov Model (HMM) performed on the level of using predefined intensity thresholds, indicating that data driven methods may be used for feature extraction without relying on prior knowledge of the data. In general, the lasso regression, support vector machines (SVM) and random forest (RF) classifiers all performed well on large sets of data-driven features, achieving greater than 82% classification accuracy when time spent in PA intensity categories was combined with k-means and HMM-derived inputs. Neural networks performed well on smaller uncorrelated feature sets, and decision trees produced consistent results with the most transparency and interpretability. With respect to physical activity recommendations, the findings indicate that gender and time spent in lifestyle minutes (760-2019 intensity counts) play a key role in classifying CVD risk. Thus, a greater emphasis on gender specific recommendations focusing on lifestyle minutes in addition to moderate and vigorous activity may be necessary. Furthermore, time spent in the activity categories, not how PA is spread throughout the day and week appear to be most important for classification of CVD risk.
- Graduation Semester
- 2016-08
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/92805
- Copyright and License Information
- Copyright 2016 Elena Boiarskaia
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…