Webpage rank using Bayes' rule and connected components
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] The PageRank is one of the most famous link-structure based webpage ranking algorithm adopted by Google which measures the importance of the webpages by their probability of being visited. However, the bottleneck of the PageRank computing speed is the power iteration process applied on the huge webgraph of WWW, which involves approximately 50 billion webpages. In this thesis a novel method named CCRank is proposed. The CCRank first decomposes the webgraph into mutually disconnected sub-webgraphs, each called a connected component (CC) or simply a block. A Bayes' theorem based block-wise local PageRank computation and weighting scheme are then used to compute the final ranking vector of CCRank, which approximates the one of PageRank algorithm. The computation can be accelerated through distribute the local PageRank computation to multiple processors. Further, unlike PageRank, the updating process for local perturbations is less computational expensive in CCRank. In order to evaluate our approach, the webgraph data is simulated based on real web characteristics. Experimental results indicate that the CCRank computing process is 92% faster on average than PageRank. Moreover, the orders of the top ranked pages in the two approaches are matched above 99%.
Access to files is limited to the University of Missouri--Columbia.