A study of gossip algorithms for internet-scale cardinality estimation of distributed XML data

MOspace/Manakin Repository

Breadcrumbs Navigation

A study of gossip algorithms for internet-scale cardinality estimation of distributed XML data

Please use this identifier to cite or link to this item: http://hdl.handle.net/10355/15636

[-] show simple item record

dc.contributor.advisor Rao, Praveen R. en
dc.contributor.author Slavov, Vasil Georgiev
dc.date.accessioned 2012-10-03T15:45:14Z
dc.date.available 2012-10-03T15:45:14Z
dc.date.issued 2012-10-03
dc.date.submitted 2012 Summer en
dc.identifier.uri http://hdl.handle.net/10355/15636
dc.description Title from PDF of title page, viewed on October 3, 2012 en
dc.description Thesis advisor: Praveen R. Rao en
dc.description Vita en
dc.description Includes bibliographic references (p. 58-69) en
dc.description Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2012 en
dc.description.abstract After more than a decade of active research and development, the peer-to-peer (P2P) computing model continues to be successful. We have witnessed the deployment of commercial P2P applications in large, Internet-scale environments. With the rise and growth of P2P, indexing and querying data stored in large-scale sharing systems has become increasingly di cult. Computing statistics over data stored in Internet-scale P2P systems is an important component of query optimization. Decentralized gossip-based protocols are very popular in networking, and in particular, in sensor networks. The simplicity and scalability of gossip protocols render them perfect for quickly computing accurate estimates of aggregates (sums, averages, etc.) in Internet-scale systems where node and link failures are the norm. In this thesis, we present the problem of cardinality estimation of XPath queries over XML data stored in a distributed, Internet-scale environment. We focus our work on three objectives: implementing gossip in an Internet-scale environment, conducting a comprehensive performance evaluation in a wide-area network, and analyzing the experimental results. We implement two gossip-based algorithms (VanillaXGossip and XGossip) which, given an XPath query, estimate the number of XML documents in the network that contain a match for the query. XGossip employs a new, divide-and-conquer strategy for load-balancing and reducing the bandwidth consumption. We conduct a comprehensive performance evaluation of both gossip algorithms on Amazon Elastic Compute Cloud (Amazon EC2) web service using a heterogeneous collection of XML documents. The goal of the performance evaluation is to nd if the results we obtain are consistent with the theoretical analysis of VanillaXGossip and XGossip. en_US
dc.description.tableofcontents Introduction -- Background and motivations -- The design of VanillaXGossip and XGossip -- Implementation of VanillaXGossip and XGossip -- Evaluation -- Conclusion and future work -- Appendix A. Algorithms -- Appendix B. XPath grammar en
dc.format.extent xi, 70 pages en
dc.language.iso en_US en_US
dc.publisher University of Missouri--Kansas City en
dc.subject.lcsh Peer-to-peer architecture (Computer networks) en
dc.subject.lcsh Querying (Computer science) en
dc.subject.other Thesis -- University of Missouri--Kansas City -- Computer science en
dc.title A study of gossip algorithms for internet-scale cardinality estimation of distributed XML data en_US
dc.type Thesis en_US
thesis.degree.discipline Computer Science en
thesis.degree.grantor University of Missouri--Kansas City en
thesis.degree.name M.S. en
thesis.degree.level Masters en


This item appears in the following Collection(s)

[-] show simple item record