Browsing by Author "Gachoki, Peter"

Now showing 1 - 5 of 5

EFFECT OF COOKING METHODS AND DURATION ON ASCORBIC ACID CONCENTRATION OF SELECTED VEGETABLES
(RJLBPCS, 2020-12-01) Wambua, Angeline; Kariuki, Richard; Gachoki, Peter; Okello, Vincent; Opiata, Patrick
Cooking vegetables before they are consumed helps in making them more palatable in addition to improving taste and texture. On the other hand, cooking causes a substantial change in the composition of chemicals thus affecting concentration and availability of nutrients. There are some cooking methods that cause oxidization of the antioxidants therefore affecting the retention of nutrients of the vegetables. Therefore, it is vital to opt for a cooking method which will lead to optimal retention of nutrients. This study was designed to determine the reduction in Vitamin C concentration associated with different cooking methods of green leafy vegetables. Vitamin C content of green leafy vegetables namely, African nightshade (Solanum nigrum) and amaranth (Amaranthus viridis) were selected for the study. Vitamin C content of the above sample vegetables were estimated using spectrophotometric method at a wavelength of 245nm. Processing methods which were employed were boiling and microwave heating which were performed at a timely interval. The findings showed that microwaving had the highest vitamin C loss. The results also showed that long duration of cooking led to massive loss from the selected vegetables. It was concluded that the water used for cooking vegetables should not be discarded for maximum retention of vitamin C in vegetables. Based on the study, it is recommended that the best duration for cooking Solanum nigrum and Amaranthus viridis should be less than 10 minutes (dependent on consumer preference) so as to retain maximum ascorbic acid.
Features Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Data
(Science and Education Publishing, 2022) Gachoki, Peter; Muraya, Moses; Njoroge, Gladys
Phenotyping has advanced with the application of high throughput phenotyping techniques such automated imaging. This has led to derivation of large quantities of high dimensional phenotypic data that could not have been achieved using manual phenotyping in a single run. Hence, the need for parallel development of statistical techniques that can appropriately handle such large and/or high dimensional data set. Moreover, there is need to come up with a statistical criteria for selecting the best image derived phenotypic features that can be used as best predictors in modelling plant growth. Information on such criteria is limited. The objective of this study is to apply feature importance, feature selection with Shapley values and LASSO regression techniques to find the subset of features with the highest predictive power for subsequent use in modelling maize plant growth using highdimensional image derived phenotypic data. The study compared the statistical power of these features extraction methods by fitting an XGBoost model using the best features from each selection method. The image derived phenomic data was obtained from Leibniz Institute of Plant Genetics and Crop Plant Research, -Gatersleben, Germany. Data analysis was performed using R-statistical software. The data was subjected to data imputation using 𝑘𝑘 Nearest Neighbours technique. Features extraction was performed using feature importance, Shapley values and LASSO regression. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance criterion emerged the best feature selection technique, followed by Shapley values and LASSO regression, respectively. The study demonstrated the potential of using feature importance as a selection technique in reduction of input variables in of high dimensional growth data set.
Features Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Data
(Science and Education Publishing, 2022) Gachoki, Peter; Muraya, Moses; Njoroge, Gladys
Phenotyping has advanced with the application of high throughput phenotyping techniques such automated imaging. This has led to derivation of large quantities of high dimensional phenotypic data that could not have been achieved using manual phenotyping in a single run. Hence, the need for parallel development of statistical techniques that can appropriately handle such large and/or high dimensional data set. Moreover, there is need to come up with a statistical criteria for selecting the best image derived phenotypic features that can be used as best predictors in modelling plant growth. Information on such criteria is limited. The objective of this study is to apply feature importance, feature selection with Shapley values and LASSO regression techniques to find the subset of features with the highest predictive power for subsequent use in modelling maize plant growth using highdimensional image derived phenotypic data. The study compared the statistical power of these features extraction methods by fitting an XGBoost model using the best features from each selection method. The image derived phenomic data was obtained from Leibniz Institute of Plant Genetics and Crop Plant Research, -Gatersleben, Germany. Data analysis was performed using R-statistical software. The data was subjected to data imputation using 𝑘𝑘 Nearest Neighbours technique. Features extraction was performed using feature importance, Shapley values and LASSO regression. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance criterion emerged the best feature selection technique, followed by Shapley values and LASSO regression, respectively. The study demonstrated the potential of using feature importance as a selection technique in reduction of input variables in of high dimensional growth data set.
MODELLING PLANT GROWTH BASED ON GOMPERTZ, LOGISTIC CURVE, EXTREME GRADIENT BOOSTING AND LIGHT GRADIENT BOOSTING MODELS USING HIGH DIMENSIONAL IMAGE DERIVED MAIZE (Zea mays L.) PHENOMIC DATA
(Chuka University, 2022-09) Gachoki, Peter
Modelling of plant growth is vital for hypotheses testing and carrying out virtual plant growth and development experiments, which may otherwise take a long time under field conditions. Modelling of plant growth has been aggravated by new phenotyping platforms that generate high dimensional data non-destructively over the entire growth time of a plant using a set of camera system. Such platforms generate high-throughput phenomic data, which is complex and constitute many features collected at multiple growth points for the same plant. However, the classical models are limited in that they can only model a single feature at a time. Moreover, information on usefulness of these features and their selection criteria is limited. The objective of this study was to apply dynamic plant growth models that could be used to dissect complex relationships between plant growth and development using several modelling strategies. These included sigmoid models, light GBM and XGBoost models. The image derived phenomic data was obtained from the Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben, Germany. The image data was imputed using 𝑘 Nearest Neighbours technique. The feature importance, Shapley values and LASSO regression were used to extract the features that were used to fit the models. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance technique emerged the best feature selection technique since its features produced the best performing XGBoost with RMSE and R-squared values of 2.1641 and 0.8292, respectively. The suitability of the RMSE and the R-squared was because that the study was a regression problem where the aforementioned tools are used as performance metrics. The results showed that the XGBoost (RMSE = 2.1641) and Light GBM (RMSE = 2.7776) performed better than the Gompertz (RMSE = 3.8378) and the logistic function (RMSE = 3.8378) models in modelling maize plant growth. The XGBoost model (RMSE = 2.1641) showed better performance than Light GBM model (RMSE = 2.7776) in modelling maize plant growth. The Gompertz model using plant volume had AIC and BIC values for 139738.3 and 139763.4, respectively. The Gompertz model for plant side area had AIC and BIC values for 98436.15 and 98461.31, respectively. The logistic function model for plant volume had AIC and BIC values for 139749.2 and 139774.4, respectively. The logistic function model for plant side area had AIC and BIC values for 98415.95 and 98441.11, respectively. The Gompertz model and logistic function models showed almost the same performance in modelling maize plant growth. The non-parametric models, the XGBoost and light GBM, were found to perform better than the classical models (Gompertz and logistic functions) in modelling maize plant growth. Therefore, the study recommends the use of feature importance for feature selection, whenever high dimensional and complex phenotypic data is involved. More over, the study also recommends the use of XGBoost as a generic model to fit high dimensional and complex phenotypic data in modelling plant growth and to predict plant biomass yield at different growth points.
Predictive Modelling of Benign and Malignant Tumors Using Binary Logistic, Support Vector Machine and Extreme Gradient Boosting Models
(Science and Education Publishing, 2019-11-26) Gachoki, Peter; Mburu, Moses; Muraya, Moses
Breast cancer is the leading type of cancer among women worldwide, with about 2 million new cases and 627,000 deaths every year. The breast tumors can be malignant or benign. Medical screening can be used to detect the type of a diagnosed tumor. Alternatively, predictive modelling can also be used to predict whether a tumor is malignant or benign. However, the accuracy of the prediction algorithms is important since any incidence of false negatives may have dire consequence since a person cannot be put under medication, which can lead to death. Moreover, cases of false positives may subject an individual to unnecessary stress and medication. Therefore, this study sought to develop and validate a new predictive model based on binary logistic, support vector machine and extreme gradient boosting models in order to improve the prediction accuracy of the cancer tumors. This study used the Breast Cancer Wilcosin data set available on Kaggle. The dependent variable was whether a tumor is malignant or benign. The regressors were the tumor features such as radius, texture, area, perimeter, smoothness, compactness, concavity, concave points, symmetry and fractional dimension of the tumor. Data analysis was done using the Rstatistical software and it involved, generation of descriptive statistics, data reduction, feature selection and model fitting. Before model fitting was done, the reduced data was split into the train set and the validation set. The results showed that the binary logistic, support vector machine and extreme gradient boosting models had predictive accuracies of 96.97%, 98.01% and 97.73%. This showed an improvement compared to already existing models. The results of this study showed that support vector machine and extreme gradient boosting have better prediction power for cancer tumors compared to binary logistic. This study recommends the use of support vector machine and extreme gradient boosting in cancer tumor prediction and also recommends further investigations for other algorithms that can improve prediction