Software infrastructure for bioinformatics research : modernization and workflow automation

No Thumbnail Available

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

[EMBARGOED UNTIL 12/01/2026] Bioinformatics platforms must evolve continuously to support the increasing scale and complexity of biological data. Legacy systems, especially those built on outdated architecture, often face limitations in scalability, security, and interoperability. Modernizing these systems ensures that computational frameworks remain maintainable, efficient, and capable of integrating new analytical workflows. Workflow automation further improves reproducibility and usability, both of which are essential in computational biology. This thesis presents the modernization and upgrade of the legacy bioinformatics system KBCommons, as well as the development of sustainable and scalable infrastructure in CrossMP. KBCommons, originally released as KBCommons v1.1 in 2019, provides a multi-omics knowledge base for integrating data across diverse organisms. It has since undergone a complete redesign of both its user interface and back-end architecture. New features now allow researchers to upload and process SNP, miRNA, methylation, RNA-seq, and proteomics datasets through automated pipelines. A dual-server workflow pipeline was implemented to separate web operations from computational tasks, improving scalability and overall performance. Secure data transfer was achieved through the integration of Google APIs. The second project, CrossMP, is a lightweight application designed to extend the infrastructure to support machine-learning workflows. A job-queue and scheduling environment was implemented to manage data-intensive analyses, track task progress, and notify users when jobs are completed. Integration with the Google Drive API enables secure handling of large datasets used in computation. Together, these systems demonstrate how software modernization and workflow automation can transform legacy bioinformatics tools into sustainable, user-friendly infrastructures for reproducible and high-throughput biological discovery. This thesis also compares the architecture and performance of two different application designs.

Table of Contents

PubMed ID

Degree

M.S.

Thesis Department

Rights

License