A study of gossip algorithms for internet-scale cardinality estimation of distributed XML data

Slavov, Vasil Georgiev

URI

http://hdl.handle.net/10355/15636

dc.contributor.advisor	Rao, Praveen R.	eng
dc.contributor.author	Slavov, Vasil Georgiev	eng
dc.date.issued	2012-10-03	eng
dc.date.submitted	2012 Summer	eng
dc.description	Title from PDF of title page, viewed on October 3, 2012	eng
dc.description	Thesis advisor: Praveen R. Rao	eng
dc.description	Vita	eng
dc.description	Includes bibliographic references (p. 58-69)	eng
dc.description	Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2012	eng
dc.description.abstract	After more than a decade of active research and development, the peer-to-peer (P2P) computing model continues to be successful. We have witnessed the deployment of commercial P2P applications in large, Internet-scale environments. With the rise and growth of P2P, indexing and querying data stored in large-scale sharing systems has become increasingly di cult. Computing statistics over data stored in Internet-scale P2P systems is an important component of query optimization. Decentralized gossip-based protocols are very popular in networking, and in particular, in sensor networks. The simplicity and scalability of gossip protocols render them perfect for quickly computing accurate estimates of aggregates (sums, averages, etc.) in Internet-scale systems where node and link failures are the norm. In this thesis, we present the problem of cardinality estimation of XPath queries over XML data stored in a distributed, Internet-scale environment. We focus our work on three objectives: implementing gossip in an Internet-scale environment, conducting a comprehensive performance evaluation in a wide-area network, and analyzing the experimental results. We implement two gossip-based algorithms (VanillaXGossip and XGossip) which, given an XPath query, estimate the number of XML documents in the network that contain a match for the query. XGossip employs a new, divide-and-conquer strategy for load-balancing and reducing the bandwidth consumption. We conduct a comprehensive performance evaluation of both gossip algorithms on Amazon Elastic Compute Cloud (Amazon EC2) web service using a heterogeneous collection of XML documents. The goal of the performance evaluation is to nd if the results we obtain are consistent with the theoretical analysis of VanillaXGossip and XGossip.	eng
dc.description.tableofcontents	Introduction -- Background and motivations -- The design of VanillaXGossip and XGossip -- Implementation of VanillaXGossip and XGossip -- Evaluation -- Conclusion and future work -- Appendix A. Algorithms -- Appendix B. XPath grammar	eng
dc.format.extent	xi, 70 pages	eng
dc.identifier.uri	http://hdl.handle.net/10355/15636	eng
dc.publisher	University of Missouri--Kansas City	eng
dc.subject.lcsh	Peer-to-peer architecture (Computer networks)	eng
dc.subject.lcsh	Querying (Computer science)	eng
dc.subject.other	Thesis -- University of Missouri--Kansas City -- Computer science	eng
dc.title	A study of gossip algorithms for internet-scale cardinality estimation of distributed XML data	eng
dc.type	Thesis	eng
thesis.degree.discipline	Computer Science (UMKC)	eng
thesis.degree.grantor	University of Missouri--Kansas City	eng
thesis.degree.level	Masters	eng
thesis.degree.name	M.S.	eng

Files in this item

Name:: SlavovStuGosAlg.pdf
Size:: 680.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Electrical Engineering Electronic Theses and Dissertations (UMKC)
The items in this collection are the scholarly output of UMKC graduate students.
2012 UMKC Theses - Freely Available Online
The items in this collection are the scholarly output of UMKC graduate students.

[-] Show simple item record