Weisfeiler-Leman graph kernels for the out-of-distribution characterization of graph structured data
Date
2024Metadata
[+] Show full item recordAbstract
This thesis presents a new metric named Graph Distributional Analytics (GDA). This approach uses Weisfeiler-Leman kernels, cosine similarity, and traditional statistical metrics to better characterize graph-structured data. It focuses on enhancing the analysis of graph-structured data and enhancing the explainability and power of Graph Neural Networks (GNNs) without introducing a new model architecture. Within existing GNN research, strong claims of out-of-distribution (OOD) generalizability are frequently made, but these claims fail when exposed to real-world data. We propose existing standards of identifying OOD data are insufficient, and a metric is needed that accurately and efficiently identifies data that is actually different from the training data. Our metric accurately identifies OOD data which allows researchers to make realistic claims about model generalizability.
Extensive experiments confirm the effectiveness of this metric through comparative analysis against traditional methods. Our study shows that GDA outperforms existing metrics in detecting OOD instances. This is needed for applications where the generalizability of GNNs is necessary, such as in drug effectiveness studies, protein interaction classification, and complex network systems in telecommunications and social media analysis. The thesis explores how this metric affects the explainability of GNNs, and it reveals the behavior and decision-making processes of these models.
This application of GDA in curriculum transfer learning optimizes data usage and computational efficiency. By strategically introducing training data, the models progressively adapt. This improves accuracy and generalization capabilities across various graph-based tasks.
This work does not propose a new GNN architecture. Instead, it offers a methodology for better understanding and analyzing the data these models process. The contributions of this thesis extend beyond academic theory to practical applications, where improving the accuracy and efficiency of GNNs can lead to significant advancements in bioinformatics, chemistry, code analysis, and network security.
Table of Contents
Introduction -- Background -- Related work -- Methodology -- Results and discussion -- Conclusion
Degree
M.S. (Master of Science)