A study of graph partitioning techniques for fast indexing and query processing of  a large RDF graph

Barenkala, Dinesh

A study of graph partitioning techniques for fast indexing and query processing of a large RDF graph

Files

BarenkalaStuGraPar.pdf (1.55 MB)

Authors

Barenkala, Dinesh

Date

2013

Format

Thesis

Abstract

In recent years, the Resource Description Framework (RDF) [34] has become increasingly important for the Web and in domains such as defense and healthcare. Companies such as the New York Times [40], Best Buy [39], and Pfizer are leveraging RDF and other Semantic Web technologies for data management. Using RDF, any assertion can be represented as a (subject, predicate, object) triple. The collection of triples together represents a graph. Many techniques have been developed for RDF indexing and query processing and the most popular among them store and process RDF data using an RDBMS. In this thesis, we study the impact of existing graph partitioning techniques on indexing and query processing of a large RDF graph (e.g., YAGO [40]) with millions of edges and vertices. Our goal is to partition a large RDF graph into smaller graphs and then index the smaller graphs efficiently for faster query processing. In order to cope with cut edges, we compute the 2-hop distance across each cut edge. Once partitions are computed, we construct an index using a recently developed technique called RIS. Queries are also processed using RIS. We report the benefits and trade-offs of two different partitioning strategies using the YAGO dataset on metrics such as index construction time, index size, and query processing time. The first partitioning strategy treats the original RDF graph as an unweighted graph during partitioning. The second strategy treats the original graph as a weighted graph during partitioning. We compared the results obtained by RIS (on partitioned graphs) with RDF-3X [38], a state-of-the-art RDF query processing engine.

URI

http://hdl.handle.net/10355/39098

Degree

M.S.

Thesis Department

Computer Science (UMKC)

Collections

2013 UMKC Theses - Freely Available Online
Computer Science and Electrical Engineering Electronic Theses and Dissertations (UMKC)

Full item page

A study of graph partitioning techniques for fast indexing and query processing of a large RDF graph

Files

Authors

Meeting name

Sponsors

Date

Journal Title

Format

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

Table of Contents

URI

DOI

PubMed ID

Degree

Thesis Department

Rights

License

Collections