A New Approach for Fast Processing of SPARQL Queries on RDF Quadruples
Metadata[+] Show full item record
The Resource Description Framework (RDF) is a standard model for representing data on the Web. It enables the interchange and machine processing of data by considering its semantics. While RDF was first proposed with the vision of enabling the Semantic Web, it has now become popular in domain-specific applications and the Web. Through advanced RDF technologies, one can perform semantic reasoning over data and extract knowledge in domains such as healthcare, biopharmaceuticals, defense, and intelligence. Popular approaches like RDF-3X perform poorly on RDF datasets containing billions of triples when the queries are large and complex. This is because of the large number of join operations that must be performed during query processing. Moreover, most of the scalable approaches were designed to operate on RDF triples instead of quads. To address these issues, we propose to develop a new approach for fast and cost-effective processing of SPARQL queries on large RDF datasets containing RDF quadruples (or quads). Our approach employs a decrease-and-conquer strategy: Rather than indexing the entire RDF dataset, it identifies groups of similar RDF graphs and indexes each group separately. During query processing, it uses a novel filtering index to first identify candidate groups that may contain matches for the query. On these candidates, it executes queries using a conventional SPARQL processor to produce the final results. A query optimization strategy using the candidate groups to further improve the query processing performance is also used.
Table of Contents
Introduction -- Background and motivations -- The design of RIQ -- Implementation of RIQ -- Evaluation -- Conclusion and future work -- Appendix A. Queries -- Appendix B. SPARQL grammar