Prediction of moisture and protein in corn kernels from multiple origins based on NIR-PLSR with gradient boosting machines for feature selection
Zheng, Runyu
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/124601
Description
Title
Prediction of moisture and protein in corn kernels from multiple origins based on NIR-PLSR with gradient boosting machines for feature selection
Author(s)
Zheng, Runyu
Issue Date
2024-05-02
Director of Research (if dissertation) or Advisor (if thesis)
Kamruzzaman, Mohammed
Committee Member(s)
Allen, Cody M.
Rausch, Kent D.
Singh, Vijay
Department of Study
Engineering Administration
Discipline
Agricultural & Biological Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
gradient boosting machine (GBM)
feature selection
SHapley Additive exPlanations (SHAP)
partial least squares regression (PLSR)
corn kernels
near-infrared (NIR) spectroscopy
component prediction.
Abstract
Differences in moisture levels and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models to do so are typically trained on samples collected from only a few environments which can lead to underestimation of both the error rates and bias of models. In this study, corn samples grown across an internationally diverse set of environments were assembled. NIR spectroscopy with chemometrics and partial least squares regression (PLSR) was used to determine moisture and protein of this international panel of corn grain samples. The potential of five feature selection methods to improve prediction accuracy by extracting sensitive wavelengths for moisture and protein in corn kernels was assessed. SHapley Additive exPlanations (SHAP) values were used to measure the impact of each feature/wavelength on the model prediction. Gradient boosting machines (GBMs), specifically CatBoost and LightGBM, were effective in selecting crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, and 2174 nm) and protein (887, 1212, 1705, 1891, 2097, and 2456 nm), producing PLSR models with coefficients of determination of validation (R2V) of 0.97 and 0.82, root mean square errors of validation (RMSEV) of 0.45% and 0.51%, and ratios of performance to deviation of validation (RPDV) of 6.20 and 2.41, for kernel protein and kernel moisture content, respectively. SHAP plots revealed the significant contribution of 2174 nm to moisture prediction and 1891 nm to protein prediction as well as their respective influence tendencies. These results illustrate the effectiveness of GBMs in NIR spectroscopy in feature engineering for predicting chemical components in the agriculture and food sectors, including developing a multi-country global calibration model for moisture and protein in corn kernels.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.