Show simple item record

dc.contributor.authorGachoki, Peter
dc.contributor.authorMuraya, Moses
dc.contributor.authorNjoroge, Gladys
dc.date.accessioned2022-11-03T12:06:15Z
dc.date.available2022-11-03T12:06:15Z
dc.date.issued2022
dc.identifier.citationPeter Gachoki, Moses Muraya, and Gladys Njoroge, “Features Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Data.” American Journal of Applied Mathematics and Statistics, vol. 10, no. 2 (2022): 44-51. doi: 10.12691/ajams-10-2-2.en_US
dc.identifier.issn2328-7292
dc.identifier.urihttp://repository.chuka.ac.ke/handle/chuka/15499
dc.description.abstractPhenotyping has advanced with the application of high throughput phenotyping techniques such automated imaging. This has led to derivation of large quantities of high dimensional phenotypic data that could not have been achieved using manual phenotyping in a single run. Hence, the need for parallel development of statistical techniques that can appropriately handle such large and/or high dimensional data set. Moreover, there is need to come up with a statistical criteria for selecting the best image derived phenotypic features that can be used as best predictors in modelling plant growth. Information on such criteria is limited. The objective of this study is to apply feature importance, feature selection with Shapley values and LASSO regression techniques to find the subset of features with the highest predictive power for subsequent use in modelling maize plant growth using highdimensional image derived phenotypic data. The study compared the statistical power of these features extraction methods by fitting an XGBoost model using the best features from each selection method. The image derived phenomic data was obtained from Leibniz Institute of Plant Genetics and Crop Plant Research, -Gatersleben, Germany. Data analysis was performed using R-statistical software. The data was subjected to data imputation using 𝑘𝑘 Nearest Neighbours technique. Features extraction was performed using feature importance, Shapley values and LASSO regression. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance criterion emerged the best feature selection technique, followed by Shapley values and LASSO regression, respectively. The study demonstrated the potential of using feature importance as a selection technique in reduction of input variables in of high dimensional growth data set.en_US
dc.language.isoenen_US
dc.publisherScience and Education Publishingen_US
dc.relation.ispartofseriesAmerican Journal of Applied Mathematics and Statistics;Volume 10, 2022 - Issue 2
dc.subjecthigh throughput phenotypingen_US
dc.subjecthigh dimensional dataen_US
dc.subjectfeature extractionen_US
dc.subjectfeature importanceen_US
dc.subjectShapley valuesen_US
dc.subjectLASSO regressionen_US
dc.titleFeatures Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Dataen_US
dc.typeArticleen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record