A new filtering index for fast processing of SPARQL queries

Paturi, Srivenu

URI

http://hdl.handle.net/10355/39099

dc.contributor.advisor	Rao, Praveen R.	eng
dc.contributor.author	Paturi, Srivenu	eng
dc.date.issued	2013	eng
dc.date.submitted	2013 Summer	eng
dc.description	Title from PDF of title page, viewed on October 21, 2013	eng
dc.description	Vita	eng
dc.description	Thesis advisor: Praveen Rao	eng
dc.description	Includes bibliographic references (pages 78-82)	eng
dc.description	Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2013	eng
dc.description.abstract	The Resource Description Framework (RDF) has become a popular data model for representing data on the Web. Using RDF, any assertion can be represented as a (subject, predicate, object) triple. Essentially, RDF datasets can be viewed as directed, labeled graphs. Queries on RDF data are written using the SPARQL query language and contain basic graph patterns (BGPs). We present a new filtering index and query processing technique for processing large BGPs in SPARQL queries. Our approach called RIS treats RDF graphs as "first-class citizens." Unlike previous scalable approaches that store RDF data as triples in an RDBMS and process SPARQL queries by executing appropriate SQL queries, RIS aims to speed up query processing by reducing the processing cost of join operations. In RIS, RDF graphs are mapped into signatures, which are multisets. These signatures are grouped based on a similarity metric and indexed using Counting Bloom Filters. During query processing, the Counting Bloom Filters are checked to filter out non-matches, and finally the candidates are verified using Apache Jena. The filtering step prunes away a large portion of the dataset and results in faster processing of queries. We have conducted an in-depth performance evaluation using the Lehigh University Benchmark (LUBM) dataset and SPARQL queries containing large BGPs. We compared RIS with RDF-3X, which is a state-of-the-art scalable RDF querying engine that uses an RDBMS. RIS can significantly outperform RDF-3X in terms of total execution time for the tested dataset and queries.	eng
dc.description.tableofcontents	Introduction -- Motivation and related work -- Background -- Bloom filters and Bloom counters -- System architecture -- Signature tree generation -- Querying the signature tree -- Evaluation -- Experiments -- Conclusion	eng
dc.format.extent	xiv, 83 pages	eng
dc.identifier.uri	http://hdl.handle.net/10355/39099	eng
dc.subject.lcsh	Web sites -- Indexing and abstracting	eng
dc.subject.lcsh	Query languages (Computer science)	eng
dc.subject.other	Thesis -- University of Missouri--Kansas City -- Computer science	eng
dc.title	A new filtering index for fast processing of SPARQL queries	eng
dc.type	Thesis	eng
thesis.degree.discipline	Computer Science (UMKC)	eng
thesis.degree.grantor	University of Missouri--Kansas City	eng
thesis.degree.level	Masters	eng
thesis.degree.name	M.S.	eng

Files in this item

Name:: PaturiNewFilInd.pdf
Size:: 2.317Mb
Format:: PDF
Description:: A new filtering index for fast ...

View/Open

This item appears in the following Collection(s)

Computer Science and Electrical Engineering Electronic Theses and Dissertations (UMKC)
The items in this collection are the scholarly output of UMKC graduate students.
2013 UMKC Theses - Freely Available Online
The items in this collection are the scholarly output of UMKC gradu

[-] Show simple item record