Browsing by Author "Gachoki, P."
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Features Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Data(Science and Education Publishing, 2022) Gachoki, P.; Muraya, M; Njoroge, GPhenotyping has advanced with the application of high throughput phenotyping techniques such automated imaging. This has led to derivation of large quantities of high dimensional phenotypic data that could not have been achieved using manual phenotyping in a single run. Hence, the need for parallel development of statistical techniques that can appropriately handle such large and/or high dimensional data set. Moreover, there is need to come up with a statistical criteria for selecting the best image derived phenotypic features that can be used as best predictors in modelling plant growth. Information on such criteria is limited. The objective of this study is to apply feature importance, feature selection with Shapley values and LASSO regression techniques to find the subset of features with the highest predictive power for subsequent use in modelling maize plant growth using high- dimensional image derived phenotypic data. The study compared the statistical power of these features extraction methods by fitting an XGBoost model using the best features from each selection method. The image derived phenomic data was obtained from Leibniz Institute of Plant Genetics and Crop Plant Research, -Gatersleben, Germany. Data analysis was performed using R-statistical software. The data was subjected to data imputation using 𝑘 Nearest Neighbours technique. Features extraction was performed using feature importance, Shapley values and LASSO regression. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance criterion emerged the best feature selection technique, followed by Shapley values and LASSO regression, respectively. The study demonstrated the potential of using feature importance as a selection technique in reduction of input variables in of high dimensional growth data set.Item Modelling Plant Growth Based on Gompertz, Logistic Curve, Extreme Gradient Boosting and Light Gradient Boosting Models Using High Dimensional Image Derived Maize (Zea mays L.) Phenomic Data(Science and Education Publishing, 2022) Gachoki, P.; Muraya, M.; Njoroge, G.Modelling of plant growth is vital for hypotheses testing and carrying out virtual plant growth and development experiments, which may otherwise take a long time under field conditions. Modelling of plant growth has been aggravated by new phenotyping platforms that generate high dimensional data non-destructively over the entire growth time of a plant using a set of camera system. Such platforms generate high-throughput phenomic data, which is complex and constitute many features collected at multiple growth points for the same plant. However, the classical models are limited in that they can only model a single feature at a time. The objective of this study was to apply dynamic plant growth models that could be used to dissect complex relationships between plant growth and development using several modelling strategies. These included sigmoid, light GBM and XGBoost models. The image derived phenomic data was obtained from the Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben, Germany. The models were fitted using R statistical software and compared based on RMSE, R-squared, AIC and BIC performance metrics. The results showed that the XGBoost (RMSE = 2.1641) and Light GBM (RMSE = 2.7776) performed better than the Gompertz (RMSE = 3.8378) and the logistic function (RMSE = 3.8378) models in modelling maize plant growth. The XGBoost model (RMSE = 2.1641) showed better performance than Light GBM model (RMSE = 2.7776) in modelling maize plant growth. The Gompertz model using plant volume had AIC and BIC values for 139738.3 and 139763.4, respectively. The Gompertz model for plant side area had AIC and BIC values for 98436.15 and 98461.31, respectively. The logistic function model for plant volume had AIC and BIC values for 139749.2 and 139774.4, respectively. The logistic function model for plant side area had AIC and BIC values for 98415.95 and 98441.11, respectively. The Gompertz model and logistic function models showed almost the same performance in modelling maize plant growth. The non-parametric models, the XGBoost and light GBM, were found to perform better than the classical models (Gompertz and logistic functions) in modelling maize plant growth. Therefore, the study recommends the use of XGBoost as a generic model to fit high dimensional and complex phenotypic data in modelling plant growth and prediction of plant biomass yield at different growth points.