[-] Show simple item record

dc.contributor.advisorPrasad, Calyameng
dc.contributor.authorZhang, Yuanxuneng
dc.date.issued2020eng
dc.date.submitted2020 Falleng
dc.description.abstractRecent science and engineering research tasks are increasingly becoming dataintensive and use workflows to automate integration and analysis of voluminous data to test hypotheses. Particularly, bold scientific advances in areas of neuroscience and bioinformatics necessitate access to multiple data archives, heterogeneous software and computing resources, and multi-site interdisciplinary expertise. Datasets are evolving, and new tools are continuously invented for achieving new state-of-the-art performance. Principled cyber and software automation approaches to data-intensive analytics using systematic integration of cyberinfrastructure (CI) technologies and knowledge discovery driven algorithms will significantly enhance research and interdisciplinary collaborations in science and engineering. In this thesis, we demonstrate a novel recommender approach to discover latent knowledge patterns from both the infrastructure perspective (i.e., measurement recommender) and the applications perspective (i.e., topic recommender and scholar recommender). In the infrastructure perspective, we identify and diagnose network-wide anomaly events to address performance bottleneck by proposing a novel measurement recommender scheme. In cases where there is a lack of ground truth in networking performance monitoring (e.g., perfSONAR deployments), it is hard to pinpoint the root-cause analysis in a multi-domain context. To solve this problem, we define a "social plane" concept that relies on recommendation schemes to share diagnosis knowledge or work collaboratively. Our solution makes it easier for network operators and application users to quickly and effectively troubleshoot performance bottlenecks on wide-area network backbones. To evaluate our "measurement recommender", we use both real and synthetic datasets. The results show our measurement recommender scheme has high performance in terms of precision, recall, and accuracy, as well as efficiency in terms of the time taken for large volume measurement trace analysis. In the application perspective, our goal is to shorten time to knowledge discovery and adapt prior domain knowledge for computational and data-intensive communities. To achieve this goal, we design a novel topic recommender that leverages a domain-specific topic model (DSTM) algorithm to help scientists find the relevant tools or datasets for their applications. The DSTM is a probabilistic graphical model that extends the Latent Dirichlet Allocation (LDA) and uses the Markov chain Monte Carlo (MCMC) algorithm to infer latent patterns within a specific domain in an unsupervised manner. We evaluate our scheme based on large collections of the dataset (i.e., publications, tools, datasets) from bioinformatics and neuroscience domains. Our experiments result using the perplexity metric show that our model has better generalization performance within a domain for discovering highly-specific latent topics. Lastly, to enhance the collaborations among scholars to generate new knowledge, it is necessary to identify scholars with their specific research interests or cross-domain expertise. We propose a "ScholarFinder" model to quantify expert knowledge based on publications and funding records using a deep generative model. Our model embeds scholars' knowledge in order to recommend suitable scholars to perform multi-disciplinary tasks. We evaluate our model with state-of-the-art baseline models (e.g., XGBoost, DNN), and experiment results show that our ScholarFinder model outperforms state-ofthe-art models in terms of precision, recall, F1-score, and accuracy.eng
dc.description.bibrefIncludes bibliographical references (pages 113-124).eng
dc.format.extentxiii, 125 pages : illustrationseng
dc.identifier.urihttps://hdl.handle.net/10355/88909
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.titleKnowledge discovery with recommenders for big data management in science and engineering communitieseng
dc.typeThesiseng
thesis.degree.disciplineComputer Scienceeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh.D.eng


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record