Obare Dominic Mong’are2026-06-112026-06-112024Obare, D. M. (2024). Improving accuracy of compressed mixed linear model: An application to genome-wide association studies (Master’s thesis, Chuka University).https://repository.chuka.ac.ke/handle/123456789/22950A Thesis Submitted to the Graduate School in Partial Fulfilment of the Requirements for the Award of the Degree of Doctor of Philosophy in Applied Statistics of Chuka University Supervisors:Prof. Moses Mahugu Muraya,Dr. Gladys Gakenia NjorogeMixed linear models are very popular in various disciplines due to their robustness in handling complex datasets and taking into account the data structures. Genome wide association studies (GWAs) are key to success in genomic prediction and statistical modelling of genotype-phenotype relationships. Genomic wide association and genomic prediction combines molecular markers and statistical models to detect variants of interest. Though several statistical models have been used in GWAS, advancement in phenotyping and sequencing technologies necessitates improvement of the existing ones in order to increase their statistical power. The general objective of this study was to develop an improved enriched compressed linear mixed model that addresses aspects of accuracy and statistical power. This study took into account cumulative genetic variants causing phenotypic differences at different developmental stages of the plant. Secondary data obtained from the database at IPK-Gatersleben, Germany, was used in this study. The data set consists of phenotypic data from 252 maize inbred lines and 50,000 Single Nucleotide Polymorphism (SNPs) markers. Data analysis was done on R-statistical software Version 4.4.1. Analysis was done on three developmental stages, at 11, 26 and 42 days after sowing (DAS). Plant phenotypic features such as volume, side area and height were used to predict plant biomass. Single trait analysis was done first (plant side area, height and volume) followed by a combination of two traits (plant volume+Plant height, Plant height + Plant side area, Plant volume+Plant Side area) then lastly a combination of all the three traits (plant Plant volume+Plant height+ Plant side area). On plant side area total number of SNPs detected were 6, on volume 8 SNPs were detected, plant height 8 SNPs were detected. On plant volume+ Plant height 20 SNPs were detected, on plant volume+ Plant side area 11 SNPs and on plant volume+ Plant height + Plant area 22 SNPs were detected across the entire analysis on different developmental stages. The results of this study underscored the significance of considering multiple composite traits simultaneously in GWAs to unravel complex genetic correlations and synergistic effects that influence plant architecture and performance. The study revealed dynamic shifts in significant SNP associations as plants progressed through different growth stages, highlighting the evolving genetic landscape during plant development. The study demonstrated the efficiency of the Compressed Mixed Linear Model (CMLM) proved to be highly efficient in clustering individuals and identifying putative quantitative trait nucleotides (QTNs). Incorporating composite phenotypic variables (plant volume, surface area and height) in the model produced the lowest AIC and BIC 1967.630 and 1999.870, respectively, indicating a well-fitting and parsimonious model. Based on the results, the study recommends using machine learning techniques like Random Forest and Lasso to select the most significant phenotypic features for predicting plant biomass. By combining predicted biomass values from multiple variables through standardization aggregation and summation statistical technique, a more informative composite feature can be generated. The composite variable provides a robust input for trait-SNPs association in GWAS, as demonstrated by the enhanced results in this study.enCompressed Mixed Linear Model (CMLM)Genome-Wide Association Studies (GWAS)Genomic PredictionSingle Nucleotide Polymorphism (SNP)Mixed Linear ModelsQuantitative Trait Nucleotides (QTNs)Composite Phenotypic TraitsImproving accuracy of compressed mixed linear model: an application to genome-wide association studiesThesis