Data Science and Analytics electronic theses and dissertations (MU)

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 15
  • Item
    A data science approach to integrating hybrid type, environmental, weather, and soil data for predicting maize yield
    (University of Missouri--Columbia, 2025) Ghani, Abdul; Haithcoat, Timothy
    [EMBARGOED UNTIL 05/01/2026] Accurate yield prediction is essential for guiding agronomic practices, supporting crop breeding, and informing policy decisions. This thesis presents a data-driven modeling framework for predicting maize yield by integrating hybrid type, environmental, soil, and weather datasets from the Genomes to Fields (G2F) initiative (2014--2023). The dataset comprises over 5,000 maize hybrids evaluated across more than 270 environments. After cleaning, harmonizing, and engineering features across datasets, several machine learning models were tested using different categorical encoding techniques. CatBoost combined with Pandas Categorical encoding emerged as the best-performing model, achieving a test RMSE of 2.591 Mg/ha and a Pearson correlation of 0.687. Model tuning through Grid Search, Random Search, Bayesian Optimization, and Hyperopt further enhanced performance stability. Model interpretability was achieved using SHAP and LIME, identifying the most influential variables affecting the maize yield prediction, including hybrid type, soil texture, planting date, and weather conditions during flowering and grain filling. The model's effectiveness was demonstrated in two key use cases: (1) Recommending top hybrids for farmers in specific environments and suggesting hybrids for simulated wet and dry conditions and (2) Assessing hybrid performance across environments to support breeders and researchers. This study presents a strong, modular, and interpretable approach to predicting crop yields that can be extended to other crops and tailored for various stakeholders in agriculture. The work lays the groundwork for data-informed agricultural planning, hybrid evaluation, and climate-resilient crop management strategies.
  • Item
    Denoising techniques and their role in enhancing plant single-cell RNA-seq data quality
    (University of Missouri--Columbia, 2025) Awan, Sania Zafar; Mirielli, Edward
    [EMBARGOED UNTIL 05/01/2026] Single-cell RNA sequencing (scRNA-Seq) significantly advances our ability to explore complex biological systems by providing gene expression profiles at the cellular level. However, this technology is still vulnerable to technical noise, including dropout effects and insufficient detection sensitivity, which can obscure authentic biological signals. While various denoising techniques have been proposed to address these challenges, their effectiveness has primarily been assessed using human and mouse datasets, creating a notable gap in understanding how these methods apply to plant systems. This research develops a pipeline to thoroughly benchmark the study of three advanced denoising methodologies MAGIC, Deep Count Autoencoder (DCA), and scVI--applied to plant single-cell transcriptomics data. The study evaluates how denoising affects critical downstream analyses, such as clustering accuracy, the resolution of transcriptional subpopulations, and the ability to recover marker genes. Additionally, we consider computational factors such as runtime efficiency, scalability, and reproducibility, which are crucial for integrating these methods into plant research workflows. In contrast to studies that prioritize marker gene discovery, this research positions denoising as an essential process for improving data quality and interpretability within plant scRNA-seq workflows. The findings create a replicable framework for benchmarking denoising methods in non-model organisms and highlight specific trade-offs researchers must consider when selecting a denoising strategy. It also offers options for automatic hyperparameter tuning models like DCA and SCVI.
  • Item
    Influence role recognition and scholar recommendation in Academic Social Networks
    (University of Missouri--Columbia, 2024) Edara, Lakshmi Srinivas; Calyam, Prasad
    Identifying scholars and their relevant publications within interdisciplinary collaborations in an Academic Social Network (ASN) is pivotal for advancing scientific knowledge. This task is inherently complex and time-consuming, requiring precise recognition of a scholar's influence role within a team for specific research activities. This thesis introduces the novel "ScholarInfluencer" recommendation system, which employs a two-pronged approach: (a) a classification model combined with network analysis on a heterogeneous knowledge graph identifies scholar influencers within interdisciplinary teams, and (b) a large language model (LLM) utilizes these influence role recognition results to respond to user queries, generating pertinent scholar and publication recommendations. This innovative approach involves constructing a heterogeneous knowledge graph from ASN datasets, which include entities such as scholars, publications, research grants, and their interrelations. The efficacy of the ScholarInfluencer system is evaluated using four prominent ASN datasets: NSF, DBLP, Cora, and CA-HepTh. The results indicate that the influence role recognition model significantly surpasses existing models, particularly showing a 13.6 percent improvement on the NSF dataset. Additionally, the recommendation model incorporating role recognition consistently outperforms its counterpart without role recognition across all datasets, with a notable 7 percent higher performance on the NSF dataset.
  • Item
    Reducing false fall alerts in fall detection using deep learning models
    (University of Missouri--Columbia, 2024) Mishra, Rameswari; Skubic, Marjorie
    Fall-related injuries pose a significant threat to older adults, with a 50 percent likelihood of mortality within six months if immobilized for over an hour after a fall. Effective fall detection and intervention strategies are imperative, yet distinguishing genuine falls from false alarms presents a challenge. The Center for Eldercare and Rehabilitation Technology at the University of Missouri, Columbia, addressed this challenge by deploying a fall detection system at the TigerPlace senior living facility in 2014. Despite advancements, false alarms persisted. In 2021, a supplementary analysis system utilizing Inception V3 and LSTM networks was introduced to further reduce false alarms. A key component of this system was the use of depth sensors instead of RGB cameras. Depth sensors were chosen primarily for privacy reasons, as they do not capture detailed visual images of individuals, thus minimizing concerns related to personal privacy and ensuring compliance with ethical standards. Despite these improvements, challenges remain due to false alarms triggered by various factors. To address this, the present study employs YOLOv5 for human detection in frames of the depth videos. The study includes preprocessing steps and runs Inception V3 and LSTM networks to verify accuracy and thresholds. The YOLOv5s model is applied to pre-processed videos, as well as the dataset generated after the LSTM model run. The outcomes of the LSTM+YOLOv5s, LSTM results and the YOLOv5s results are compared to evaluate the effectiveness of the LSTM+YOLOv5s model in detecting falls, and the results of the combination reduces the false fall alarms by 11 percent .The training dataset underwent manual annotation, and the resultant model accuracy was manually evaluated to ensure robustness, addressing privacy concerns comprehensively. This strategic approach aims to reduce false alerts and enhance fall detection systems, crucial for the well-being of elderly individuals. Achieving a balance between swift response and minimizing disruptions is essential for comprehensive fall management in aging populations.
  • Item
    Mapping spatial disparity in news provision : a study of news deserts in rural Missouri
    (University of Missouri--Columbia, 2024) De Jesus, Yves; Haithcoat, Timothy
    Despite extensive discussions on news deserts, a detailed examination of news provision within these areas remains limited. This study explores the scope of news provision and its spatial distribution in rural Missouri in the context of socio-economic factors. Our analysis of geo-tagged articles uncovers notable variations in news availability across rural cities, highlighting the intricate interplay between local demographics and the distribution of news provisions. Using geo-tagged articles as a lens, the study delves into the distribution patterns of news content across different bivariate classes, highlighting disparities in news provision based on areal characteristics. The analysis reveals spatial disparities in news coverage, influenced by median household income levels and diversity indices, with Jefferson City's unique socio-economic profile playing a significant role in shaping the distribution patterns. To measure the relationship between the bivariate classes and article count distribution, this study used Poisson regression analysis which provided statistically significant insights into the dynamics of news coverage distribution across bivariate classes. The analysis found that median household income generally correlates with increased news coverage, confirming existing literature. Additionally, the regression analysis highlighted the inverse relationship between diversity levels and news coverage within rural areas. Specifically, as diversity levels increase, there is a consistent decrease in the count of news articles, even when controlling for median household income. These results emphasize the substantial impact of increasing diversity on lower news coverage, highlighting an intricate interplay between socio-economic characteristics and media coverage. These findings demonstrate the importance of contextualizing the socio-economic dynamics of individual cities in interpreting the spatial distribution of news coverage, advocating for more inclusive and equitable news ecosystems within rural communities. By examining the spatial dynamics of news provision, it informs efforts to address gaps in media coverage and foster a more balanced media landscape in rural settings, offering valuable insights to the fields of local journalism, community dynamics, and media studies. This study contributes valuable insights to the fields of local journalism, community dynamics, and media studies, informing efforts to address gaps in media coverage and foster a more balanced media landscape in rural settings.