A study of graph partitioning techniques for fast indexing and query processing of a large RDF graph
Metadata[+] Show full item record
In recent years, the Resource Description Framework (RDF)  has become increasingly important for the Web and in domains such as defense and healthcare. Companies such as the New York Times , Best Buy , and Pfizer are leveraging RDF and other Semantic Web technologies for data management. Using RDF, any assertion can be represented as a (subject, predicate, object) triple. The collection of triples together represents a graph. Many techniques have been developed for RDF indexing and query processing and the most popular among them store and process RDF data using an RDBMS. In this thesis, we study the impact of existing graph partitioning techniques on indexing and query processing of a large RDF graph (e.g., YAGO ) with millions of edges and vertices. Our goal is to partition a large RDF graph into smaller graphs and then index the smaller graphs efficiently for faster query processing. In order to cope with cut edges, we compute the 2-hop distance across each cut edge. Once partitions are computed, we construct an index using a recently developed technique called RIS. Queries are also processed using RIS. We report the benefits and trade-offs of two different partitioning strategies using the YAGO dataset on metrics such as index construction time, index size, and query processing time. The first partitioning strategy treats the original RDF graph as an unweighted graph during partitioning. The second strategy treats the original graph as a weighted graph during partitioning. We compared the results obtained by RIS (on partitioned graphs) with RDF-3X , a state-of-the-art RDF query processing engine.
Table of Contents
Introduction -- Background -- Proposed design -- Implementation -- Performance evaluation -- Conclusion and future work -- Appendix A. Algorithms