Enriching Knowledge Graphs Using Machine Learning Techniques
Metadata[+] Show full item record
A knowledge graph represents millions of facts and reliable information about people, places, and things. These knowledge graphs have proven their reliability and their usage for providing better search results; answering ambiguous questions regarding entities; and training semantic parsers to enhance the semantic relationships over the Semantic Web. However, while there exist a plethora of datasets on the Internet related to Food, Energy, and Water (FEW), there is a real lack of reliable methods and tools that can consume these resources. This hinders the development of novel decision-making applications utilizing knowledge graphs. In this dissertation, we introduce a novel tool, called FoodKG, that enriches FEW knowledge graphs using advanced machine learning techniques. Our overarching goal is to improve decision-making, knowledge discovery, and provide improved search results for data scientists in the FEW domains. Given an input knowledge graph (constructed on raw FEW datasets), FoodKG enriches it with semantically related triples, relations, and images based on the original dataset terms and classes. FoodKG employs an existing graph embedding technique trained on a controlled vocabulary called AGROVOC, which is published by the Food and Agriculture Organization of the United Nations. AGROVOC includes terms and classes in the agriculture and food domains. As a result, FoodKG can enhance knowledge graphs with semantic similarity scores and relations between different classes, classify the existing entities, and allow FEW experts and researchers to use scientific terms for describing FEW concepts. The resulting model obtained after training on AGROVOC was evaluated against the state-of-the-art word embedding and knowledge graph embedding models that were trained on the same dataset. We observed that this model outperformed its competitors based on the Spearman Correlation Coefficient score. We introduced Federated Learning (FL) techniques to further extend our work and include private datasets by training smaller version of the models at each dataset site without accessing the data and then aggregating all the models at the server-side. We propose an algorithm that we called RefinedFed to further extend the current FL work by filtering the models at each dataset site before the aggregation phase. Our algorithm improves the current FL model accuracy from 84% to 91% on MNIST dateset.