Mathematics and Statistics

Permanent URI for this collectionhttps://repository.chuka.ac.ke/handle/chuka/753

Browse

Recent Submissions

Now showing 1 - 20 of 47
  • Item
    Modeling and Forecasting Kenyan GDP Using Autoregressive Integrated Moving Average (ARIMA) Models
    (Science Publishing Group, 2016-04-13) Musundi, Sammy Wabomba; M’mukiira, Peter Mutwiri; Mungai, Fredrick
    The Gross Domestic Product (GDP) is the market value of all goods and services produced within the borders of a nation in a year. In this paper, Kenya’s annual GDP data obtained from the Kenya National Bureau of statistics for the years 1960 to 2012 was studied. Gretl and SPSS 21 statistical softwares were used to build a class of ARIMA (autoregressive integrated moving average) models following the Box-Jenkins method to model the GDP. ARIMA (2, 2, 2) time series model was established as the best for modeling the Kenyan GDP according to the recognition rules and stationary test of time series under the AIC criterion. The results of an in-sample forecast showed that the relative and predicted values were within the range of 5%, and the forecasting effect of this model was relatively adequate and efficient in modeling the annual returns of the Kenyan GDP. Finally, we used the fitted ARIMA model to forecast the GDP of Kenya for the next five years.
  • Item
    A NOTE ON QUASI-SIMILARITY OF OPERATORS IN HILBERT SPACES
    (International Journal of Mathematical Archive, 2015-07-23) SAMMY W. MUSUNDI; ISAIAH N.SITATI; BERNARD M. NZIMBI; KIKETE W. DENNIS
    In this paper we introduce the notion of Quasi-similarity of bounded linear operators in Hilbert Spaces. We do so by defining a quasi- affinity from one Hilbert Space H to K. Some results on quasi- affinities are also discussed. It has already been shown that on a finite dimensional Hilbert Space, quasi similarity is an equivalence relation that is; it is reflexive, symmetric and also transitive. Using the definition of commutants of two operators, we give an alternative result to show that quasi similarity is an equivalence relation on an infinite dimensional Hilbert Space. Finally, we establish the relationship between quasi similarity and almost similarity equivalence relations in Hilbert Spaces using hermitian and normal operators.
  • Item
    The Banach Numerical Range for Finite Linear Operators
    (Modern Scientific Press Company, 2020-02-14) M. Ohuru, Priscah; W. Musundi, Sammy
    The numerical range has been a subject of interest to many researchers and scholars in the recent past. Based on the research outputs, many results have been obtained. Besides, several generalizations of the classical numerical range have also been made. The recent developments have focused on the theory of operators on Hilbert spaces. The determination of the numerical ranges of linear and nonlinear operators have been given in both the Hilbert and Banach spaces. In addition, results of these numerical ranges have been extended to the case of two operators in both spaces. It is important to note that more generalizations have been made in Hilbert spaces as compared to those that have been made in the Banach spaces. The Banach space has two major numerical ranges which are: the spatial and algebraic numerical ranges. This research focuses on determining the numerical range for a finite number of linear operators in the Banach space based on the classical definition. Properties which hold for the classical numerical range have been shown to hold for the Banach space numerical range. The property of convexity has been established using the Toeplitz-Hausdorff theorem under the condition that the Banach space is smooth. Furthermore, the numerical radius and the spectrum of these operators have also been determined.
  • Item
    PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem
    (Revista Colombiana de Estadística - Applied Statistics, 2020) Wagala, Adolphus; González-Farías, Graciela; Ramos, Rogelio
    This study involves the implentation of the extensions of the partial least squares generalized linear regression (PLSGLR) by combining it with logistic regression and linear discriminant analysis, to get a partial least squares generalized linear regression-logistic regression model (PLSGLR-log), and a partial least squares generalized linear regression-linear discriminant analysis model (PLSGLRDA). A comparative study of the obtained classifiers with the classical methodologies like the k-nearest neighbours (KNN), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), ridge partial least squares (RPLS), and support vector machines(SVM) is then carried out. Furthermore, a new methodology known as kernel multilogit algorithm (KMA) is also implemented and its performance compared with those of the other classifiers. The KMA emerged as the best classifier based on the lowest classification error rates compared to the others when applied to the types of data are considered; the unpreprocessed and preprocessed.
  • Item
    A likelihood ratio test for correlated paired multivariate samples
    (Chilean Statistical Society, 2020-04) Wagala, Adolphus
    Many laboratory experiments in the fields of biological sciences usually involve two main groups say the healthy and infected subjects. In one of these kind of experiments, each specimen from each group can be divided in two portions; one portion is stimulated while the other remains unstimulated. Consequently resulting into two main groups with paired measurements that are correlated. For all the groups, p genes are measured for expression. The stimulation in this case can be done by introducing a known infection causing micro-organism like the group A streptococcus which is usually associated with the acute rheumatic fever. An important question in such experiment would be to statistically test for the di↵erences in the di↵erences in means for the healthy and the infected groups. That is, the di↵erence in the means of the healthy group (stimulated and unstimulated) is tested against the di↵erence in the means of the infected (stimulated and unstimulated) group. In this paper, a likelihood ratio test statistic is developed for such kind of problems. The developed statistics and the Hotelling T2 statistic are both applied to the data are simulated from real biological situations and their performances are compared. The simulated data exhibit the correlation structure similar to that of real biological data obtained from experiments involving the milliplex analyst biomarker data sets. The results indicate that the proposed test statistic give the same conclusions for the hypotheses tested as those of the Hotelling T2 test. However, the proposed test is intuitively more appealing since it takes care of the correlations between the pairs in the data. The simulation study confirms that the test statistics follow a chi-square distribution. This research contributes a theoretical analysis of paired correlated samples motivated by a practical problem for which the existing statistical methods in use have seldomly taken into account the correlation structure of the data.
  • Item
    Application of Response Surface Methodology in Optimization of the Yields of Common Bean (Phaseolus vulgaris L.) Using Animal Manures
    (Science and Education Publishing, 2020-07) Masai, Kimtai Leonard; Muraya, Moses M; Wagala, Adolphus
    The objective of design and analysis of experiments is to optimize a response variable which is influenced by several independent variables. In agriculture, many statistical studies have focused on investigating the effect of application of organic manure on the yield and yield components of crops. However, many of these studies do not try to optimize the application of the manures for maximum productivity, but select the best treatment among the treatment range used. This is mainly due to design and analysis of experiments applied. Therefore, there is a need to apply a statistical method that would establish the effect of the application of organic manures on crop production and in addition optimize the levels of application of these manures for maximum productivity. This study aimed at application of response surface methodology for optimization of the yields of common bean (Phaseolus vulgaris L.) using animal manure. The study was conducted at Chuka University Horticultural Demonstration Farm. The experiment was laid down in a Randomized Complete Block Design. The treatments consisted of three organic manure sources (cattle manure, poultry manure and goat manure) each at three levels (0, 3 and 6 tonnes per ha). Data was collected from six weeks after sowing to physiological maturity. Data was collected on the weight of the grain yield harvested in each experimental plot measured by use of a weighing scale. The data collected was analysis using the R-statistical software. The study findings indicated that animal manures had a significant effect (p < 0.05) on the yield of common beans. The results also showed that the optimum levels of application of the manures in the area of study were 2.1608 t ha-1 , 12.7213 t ha-1 and 4.1417 t ha-1 cattle manure, poultry manure and goat manure, respectively. These were the optimum levels that would lead to maximum yield of common beans without an extra cost of input.
  • Item
    Features Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Data
    (Science and Education Publishing, 2022) Gachoki, Peter; Muraya, Moses; Njoroge, Gladys
    Phenotyping has advanced with the application of high throughput phenotyping techniques such automated imaging. This has led to derivation of large quantities of high dimensional phenotypic data that could not have been achieved using manual phenotyping in a single run. Hence, the need for parallel development of statistical techniques that can appropriately handle such large and/or high dimensional data set. Moreover, there is need to come up with a statistical criteria for selecting the best image derived phenotypic features that can be used as best predictors in modelling plant growth. Information on such criteria is limited. The objective of this study is to apply feature importance, feature selection with Shapley values and LASSO regression techniques to find the subset of features with the highest predictive power for subsequent use in modelling maize plant growth using highdimensional image derived phenotypic data. The study compared the statistical power of these features extraction methods by fitting an XGBoost model using the best features from each selection method. The image derived phenomic data was obtained from Leibniz Institute of Plant Genetics and Crop Plant Research, -Gatersleben, Germany. Data analysis was performed using R-statistical software. The data was subjected to data imputation using 𝑘𝑘 Nearest Neighbours technique. Features extraction was performed using feature importance, Shapley values and LASSO regression. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance criterion emerged the best feature selection technique, followed by Shapley values and LASSO regression, respectively. The study demonstrated the potential of using feature importance as a selection technique in reduction of input variables in of high dimensional growth data set.
  • Item
    Transitivity Action of the Cartesian Product of the Alternating Group Acting on a Cartesian Product of Ordered Sets of Triples
    (Asian Research Journal of Mathematics, 2021-12) Maraka, K.; Musundi, W.S.; Nyaga, L.N.
    In this paper, we investigate some transitivity action properties of the cartesian product of the alternating group 𝐴𝑛 (𝑛 ≥ 5) acting on a cartesian product of ordered sets of triples using the Orbit-Stabilizer Theorem by showing that the length of the orbit (𝑝, 𝑠, 𝑣) in 𝐴𝑛 × 𝐴𝑛 × 𝐴𝑛, (𝑛 ≥ 5) acting on 𝑃 [3] × 𝑆 [3] × 𝑉 [3] is equivalent to the cardinality of 𝑃 [3] × 𝑆 [3] × 𝑉 [3] to imply transitivity.
  • Item
    Spectral Picture Of Almost Similar Operators
    (Journal of Multidisciplinary Engineering Science and Technology (JMEST), 2022) Muriithi, Eric Gitonga; Musundi, Sammy W.; Nzimbi, Bernard M.
    Various results that relate to almost similarity and other classes of operators such as isometry, normal, unitary and compact operators have been extensively discussed. In this paper, we describe the spectral picture of almost similar operators. To be more specific we will describe the spectrum, the spectral radius, the numerical radius as well as the norm of almost similar operators.
  • Item
    Norm- Attainability of Generalized Finite Operators on C*- Algebra
    (Modern Scientific Press Company, 2022) Sule, Amenya C.; Sammy, Musundi W.; Jacob, Kirimi
    Norm -attainability of elementary operators on Hilbert and Banach spaces have been Characterized by many mathematicians. However, there is little information on Normattainability of generalized finite operators on C*-algebra. A pair of bounded linear operators 𝐴, 𝐵 on a complex Hilbert space 𝐻 is called generalized finite operators if ||𝐴𝑋 − 𝑋𝐵 − 𝐼 || ≥ 1 for each 𝑥𝜖𝐵(𝐻). This paper therefore determines the norm attainability of these generalized finite operators on C*-algebra when implemented by norm attainable operators 𝐴, 𝐵.
  • Item
    Features Selection in Statistical Classification of High Dimensional Image Derived Maize (Zea Mays L.) Phenomic Data
    (Science and Education Publishing, 2022) Gachoki, Peter; Muraya, Moses; Njoroge, Gladys
    Phenotyping has advanced with the application of high throughput phenotyping techniques such automated imaging. This has led to derivation of large quantities of high dimensional phenotypic data that could not have been achieved using manual phenotyping in a single run. Hence, the need for parallel development of statistical techniques that can appropriately handle such large and/or high dimensional data set. Moreover, there is need to come up with a statistical criteria for selecting the best image derived phenotypic features that can be used as best predictors in modelling plant growth. Information on such criteria is limited. The objective of this study is to apply feature importance, feature selection with Shapley values and LASSO regression techniques to find the subset of features with the highest predictive power for subsequent use in modelling maize plant growth using highdimensional image derived phenotypic data. The study compared the statistical power of these features extraction methods by fitting an XGBoost model using the best features from each selection method. The image derived phenomic data was obtained from Leibniz Institute of Plant Genetics and Crop Plant Research, -Gatersleben, Germany. Data analysis was performed using R-statistical software. The data was subjected to data imputation using 𝑘𝑘 Nearest Neighbours technique. Features extraction was performed using feature importance, Shapley values and LASSO regression. The Shapley values extracted 25 phenotypic features, feature importance extracted 31 features and LASSO regression extracted 12 features. Of the three techniques, the feature importance criterion emerged the best feature selection technique, followed by Shapley values and LASSO regression, respectively. The study demonstrated the potential of using feature importance as a selection technique in reduction of input variables in of high dimensional growth data set.
  • Item
    Forecasting Commodity Price Index of Food and Beverages in Kenya Using Seasonal Autoregressive Integrated Moving Average (SARIMA) Models
    (EJ-MATH, European Journal of Mathematics and Statistics, 2021) Wanjuki, Teddy Mutugi; Wagala, Adolphus; Muriithi, Dennis K.
    Price stability is the primary monetary policy objective in any economy since it protects the interests of both consumers and producers. As a result, forecasting is a common practice and a vital aspect of monetary policymaking. Future predictions guide monetary and fiscal policy tools that that be used to stabilize commodity prices. As a result, developing an accurate and precise forecasting model is critical. The current study fitted and forecasted the food and beverages price index (FBPI) in Kenya using seasonal autoregressive integrated moving average (SARIMA) models. Unlike other ARIMA models like the autoregressive (AR), Moving Average (MA), and non-seasonal ARMA models, the SARIMA model accounts for the seasonal component in a given time series data better forecasts. The study relied on secondary data obtained from the KNBS website on monthly food and beverage price index in Kenya from January 1991 to February 2020. R-statistical software was used to analyze the data. The parameter estimation was done using the Maximum Likelihood Estimation method. Competing SARIMA models were compared using the Mean Absolute Error (MAE), Mean Absolute Scaled Error (MASE),.and Mean Absolute Percentage Error (MAPE). A first-order differenced SARIMA (1,1,1) (0,1,1)12 minimized these model evaluation criteria (AIC = 1818.15, BIC =1833.40). The forecasting ability evaluation statistics MAE = 2.00%, MAPE = 1.62% and MASE = 0.87%. The 24-step ahead forecasts showed that the FPBI is unstable with an overall increasing trend. Therefore, the monetary policy committee ought to control inflation through monetary or fiscal policy, strengthening food security and trade liberalization.
  • Item
    Application of Principal Component Analysis and Hierarchical Regression Model on Kenya Macroeconomic Indicators
    (EJ-MATH, European Journal of Mathematics and Statistics, 2022) Mbaluka, Morris Kateeti; Muriithi, Dennis K.; Njoroge, Gladys G.
    The aim of this paper was to apply Principal Component Analysis (PCA) and hierarchical regression model on Kenyan Macroeconomic variables. The study adopted a mixed research design (descriptive and correlational research designs). The 18 macroeconomic variables data were extracted from Kenya National Bureau of Statistics and World Bank for the period 1970 to 2019. The R software was utilized to conduct all the data analysis. Principal Component Analysis was used to reduce the dimensionality of the data, where the original data set matrix was reduced to Eigenvectors and Eigenvalues. A hierarchical regression model was fitted on the extracted components, and R2 was used to determine whether the components were a good fit for predicting economic growth. The results from the study showed that the first component explained 73.605 % of the overall Variance and was highly correlated with 15 original variables. Additionally, the second principal component described approximately 10.03% of the total Variance, while the two variables had a higher positive loading into it. About 6.22% of the overall variance was explained by the third component, which was highly correlated with only one of the original variables. The first, second, and third models had F statistics of 2385.689, 1208.99, and 920.737, respectively, and each with a p-value of 0.0001<5% was hence implying that the models were significant. The third model had the lowest mean square error of 17.296 hence described as the best predictive model. Since component 1 had the highest Variance explained, and model 1 had a lower p-value than other models, Principal component 1 was more reliable in explaining economic growth. Therefore, it was concluded that the macroeconomic variables associated with the monetary economy, the trade and openness of the economy with government activities, the consumption factor of the economy, and the investment factor of the economy predict economic growth in Kenya. The study recommends that PCA should be utilized when dealing with more than 15 variables, and hierarchical regression model building technique be used to determine the partial variance change among the independent variables in regression modeling.
  • Item
    Singular Spectrum Analysis: An Application to Kenya’s Industrial Inputs Price Index
    (EJ-MATH, European Journal of Mathematics and Statistics, 2021) Kimutai, K. Emmanuel; Wagala, Adolphus; Muriithi, Dennis K.
    Abstract —Time series modeling and forecasting techniques serve as gauging tools to understand the time-related properties of a given time series and its future course. Most financial and economic time series data do not meet the restrictive assumptions of normality, linearity, and stationarity of the observed data, limiting the application of classical models without data transformation. As non-parametric methods, Singular Spectrum Analysis (SSA) is data-adaptive; hence do not necessarily consider these restrictive assumptions as in classical methods. The current study employed a longitudinal research design to evaluate how SSA fist Kenya’s monthly industrial inputs price index from January 1992 to April 2022. Since 2018, reducing the costs of industrial inputs has been one of Kenya’s manufacturing agendas to level the playing field and foster Kenya’s manufacturing sector. It was expected that Kenya’s Manufacturing Value Added hit a tune of 22% by 2022. The study results showed that the SSA (L = 12, r =7) (MAPE = 0.707%) provides more reliable forecasts. The 24-period forecasts showed that the industrial inputs price index remains high above the index in 2017 before the post-industrial agenda targeting a reduction in the cost of industrial inputs. Thus, the industrial input prices should be reduced to a sustainable level.
  • Item
    Singular Spectrum Analysis: An Application to Kenya’s Industrial Inputs Price Index
    (Springer, 2022-01) Kimutai, Emmanuel K.; Wagala, Adolphus; Muriithi, Dennis K.
    Time series modelling and forecasting techniques serve as gauging tools to understand the time-related properties of a given time series and its future course. Most financial and economic time series data do not meet the restrictive assumptions of normality, linearity, and stationarity of the observed data, limiting the application of classical models without data transformation. As non-parametric methods, Singular Spectrum Analysis (SSA) is data adaptive; hence do not necessarily consider these restrictive assumptions as in classical methods. The current study employed a longitudinal research design to evaluate how SSA fist Kenya’s monthly industrial inputs price index from January 1992 to April 2022. Since 2018, reducing the costs of industrial inputs has been one of Kenya’s manufacturing agendas to level the playing field and foster Kenya’s manufacturing sector. It was expected that Kenya’s Manufacturing Value Added hit a tune of 22% by 2022. The study results showed that the SSA (L = 12, r =7) (MAPE = 0.707%) provides more reliable forecasts. The 24-period forecasts showed that the industrial inputs price index remains high above the index in 2017 before the post-industrial agenda targeting a reduction in the cost of industrial inputs. Thus, the industrial input prices should be reduced to a sustainable level.
  • Item
    Forecasting Commodity Price Index of Food and Beverages in Kenya Using Seasonal Autoregressive Integrated Moving Average (SARIMA) Models
    (Springer, 2021-12) Wanjuki, Teddy Mutugi
    Price stability is the primary monetary policy objective in any economy since it protects the interests of both consumers and producers. As a result, forecasting is a common practice and a vital aspect of monetary policymaking. Future predictions guide monetary and fiscal policy tools that that be used to stabilize commodity prices. As a result, developing an accurate and precise forecasting model is critical. The current study fitted and forecasted the food and beverages price index (FBPI) in Kenya using seasonal autoregressive integrated moving average (SARIMA) models. Unlike other ARIMA models like the autoregressive (AR), Moving Average (MA), and non-seasonal ARMA models, the SARIMA model accounts for the seasonal component in a given time series data better forecasts. The study relied on secondary data obtained from the KNBS website on monthly food and beverage price index in Kenya from January 1991 to February 2020. R-statistical software was used to analyze the data. The parameter estimation was done using the Maximum Likelihood Estimation method. Competing SARIMA models were compared using the Mean Absolute Error (MAE), Mean Absolute Scaled Error (MASE),.and Mean Absolute Percentage Error (MAPE). A first-order differenced SARIMA (1,1,1) (0,1,1)12 minimized these model evaluation criteria (AIC = 1818.15, BIC =1833.40). The forecasting ability evaluation statistics MAE = 2.00%, MAPE = 1.62% and MASE = 0.87%. The 24-step ahead forecasts showed that the FPBI is unstable with an overall increasing trend. Therefore, the monetary policy committee ought to control inflation through monetary or fiscal policy, strengthening food security and trade liberalization.
  • Item
    PLS Generalized Linear Regression and Kernel Multilogit Algorithm (KMA) for Microarray Data Classification Problem
    (2020) Wagala, Adolphus; González-Farías, Graciela; Ramos, Rogelio; Dalmau, Oscar
    This study involves the implentation of the extensions of the partial least squares generalized linear regression (PLSGLR) by combining it with logistic regression and linear discriminant analysis, to get a partial least squares generalized linear regression-logistic regression model (PLSGLR-log), and a partial least squares generalized linear regression-linear discriminant analysis model (PLSGLRDA). A comparative study of the obtained classifiers with the classical methodologies like the k-nearest neighbours (KNN), linear iscriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), ridge partial least squares (RPLS), and support vector machines(SVM) is then carried out. Furthermore, a new methodology known as kernel multilogit algorithm (KMA) is also implemented and its performance compared with those of the other classifiers. The KMA emerged as the best classifier based on the lowest classification error rates compared to the others when applied to the types of data are considered; the unpreprocessed and preprocessed.
  • Item
    On Similarity and Quasisimilarity equivalence relations
    (2012-01) Sitati, Isaiah Nalianya; Musundi, Sammy Wabomba; Nzimbi, Benard Mutuku; Kirimi, Jacob
    Similarity and unitary equivalence can be shown to be of equivalence relations. We discuss a result showing that two similar operators have equal spectra (i.e. point and approximate point spectrum). More so, unitary equivalence results for invariant subspaces and normal operators are proved. For similar normal operators, we state the Fuglede – Putnam –Rosenblum theorem that makes proofs for similar normal operators more simplified. It is also noted that direct sums and summands are preserved under unitary equivalence. Furthermore, we show that the natural concept of equivalence between Hilbert Space operators is unitary equivalence which is stronger than similarity. By introducing the notion of quasisimilarity of operators which is the same as similarity in finite dimensional spaces, but in infinite dimensional spaces, it is a much weaker relation, we further show that quasisimilarity is an equivalence relation. We also link invariant subspaces and hyperinvariant subspaces with quasisimilarity where it is seen that similarity preserves nontrivial invariant subspaces while quasisimilarity preserves nontrivial hyperinvariant subspaces.
  • Item
    On The Convexity of A Generalized q-Numerical Range’’
    (2005-01) Musundi, Sammy Wabomba
    For a given $q in kom$ with $|q| le 1$, we study the $C$-numerical range of a Hilbert space operator where $C$ is an operator of the form [ left( begin{array}{ccc} qI_n & sqrt{1-|q|^2}I_n \ 0_n & 0_n end{array} right) oplus 0. ] Some known results on the $q$-numerical range are extended to this set.
  • Item
    Equivalent Banach operator ideal norms’’
    (2012-01) Musundi, S. Wambomba; Shem, Aywa; Fourie, Jan
    Let X, Y be Banach spaces and consider the w'-topology (the dual weak operator topology) on the space (L(X, Y) of bounded linear operators from X into X with the uniform operator norm. L w' (X, Y) is the space of all T ∈ L(X, Y) for which there exists a sequence of compact linear operators (Tn) ⊂ K(X, Y) such that T = w' - lim nT n. Two equivalent norms, on L w'(X, Y) are considered. We show that and Banach operator ideals.