SELECTION OF OPTIMAL FEATURES IN STATISTICAL MODELLING

Loading...
Thumbnail Image

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Chuka University

Abstract

In statistical modelling, selection of optimal features entails making a selection of relevant predictor variables to be used in development of statistical models. Most modelling studies have focused on construction of statistical models skipping out or failing to put on record the process of selection of best features which is an integral part of statistical modeling. This failure might lead to use of duplicated features, features that are less relevant or other that have low variance in addition to random features which could result to poor performing prediction models. This study seeks to discuss how feature selection can be done as a pre-requisite for statistical modeling. Some of the methods used in selection of best features include; forward selection, backward elimination, recursive elimination, entropy selection, variance threshold elimination, chi-square statistics, tree based selection, feature importance and correlation matrix with heat maps. This study is vital to researchers building statistical models since use of optimal features in statistical modeling would lead to high performing statistical models.

Description

pkgachoki@gmail.com; moses.muraya@chuka.ac.ke

Keywords

Feature selection, forward selection, feature importance, correlation matrix with heatmaps

Citation

Gachoki, P. K., Njoroge, G. G. and Muraya, M. M. (2021). Selection of optimal features in statistical modelling. In: Isutsa, D. K. (Ed.). Proceedings of the 7th International Research Conference held in Chuka University from 3rd to 4th December 2020, Chuka, Kenya, p. 555-564