dc.contributor.advisor | Rao, Praveen R. | eng |
dc.contributor.author | Paturi, Srivenu | eng |
dc.date.issued | 2013 | eng |
dc.date.submitted | 2013 Summer | eng |
dc.description | Title from PDF of title page, viewed on October 21, 2013 | eng |
dc.description | Vita | eng |
dc.description | Thesis advisor: Praveen Rao | eng |
dc.description | Includes bibliographic references (pages 78-82) | eng |
dc.description | Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2013 | eng |
dc.description.abstract | The Resource Description Framework (RDF) has become a popular data model for
representing data on the Web. Using RDF, any assertion can be represented as a (subject,
predicate, object) triple. Essentially, RDF datasets can be viewed as directed, labeled
graphs. Queries on RDF data are written using the SPARQL query language and contain
basic graph patterns (BGPs). We present a new filtering index and query processing
technique for processing large BGPs in SPARQL queries. Our approach called RIS treats
RDF graphs as "first-class citizens." Unlike previous scalable approaches that store RDF
data as triples in an RDBMS and process SPARQL queries by executing appropriate SQL
queries, RIS aims to speed up query processing by reducing the processing cost of join
operations. In RIS, RDF graphs are mapped into signatures, which are multisets. These
signatures are grouped based on a similarity metric and indexed using Counting Bloom
Filters. During query processing, the Counting Bloom Filters are checked to filter out
non-matches, and finally the candidates are verified using Apache Jena. The filtering step
prunes away a large portion of the dataset and results in faster processing of queries. We
have conducted an in-depth performance evaluation using the Lehigh University
Benchmark (LUBM) dataset and SPARQL queries containing large BGPs. We compared RIS with RDF-3X, which is a state-of-the-art scalable RDF querying engine that uses an RDBMS. RIS can significantly outperform RDF-3X in terms of total execution time for the tested dataset and queries. | eng |
dc.description.tableofcontents | Introduction -- Motivation and related work -- Background -- Bloom filters and Bloom counters -- System architecture -- Signature tree generation -- Querying the signature tree -- Evaluation -- Experiments -- Conclusion | eng |
dc.format.extent | xiv, 83 pages | eng |
dc.identifier.uri | http://hdl.handle.net/10355/39099 | eng |
dc.subject.lcsh | Web sites -- Indexing and abstracting | eng |
dc.subject.lcsh | Query languages (Computer science) | eng |
dc.subject.other | Thesis -- University of Missouri--Kansas City -- Computer science | eng |
dc.title | A new filtering index for fast processing of SPARQL queries | eng |
dc.type | Thesis | eng |
thesis.degree.discipline | Computer Science (UMKC) | eng |
thesis.degree.grantor | University of Missouri--Kansas City | eng |
thesis.degree.level | Masters | eng |
thesis.degree.name | M.S. | eng |