Naive Bayes algorithm for Twitter sentiment analysis and its implementation in MapReduce

Li, Zhaoyu

Naive Bayes algorithm for Twitter sentiment analysis and its implementation in MapReduce

Files

research.pdf (5.64 MB)

public.pdf (2.26 KB)

short.pdf (37.36 KB)

Authors

Li, Zhaoyu

Date

2014

Format

Thesis

Abstract

Data has been growing exponentially in recent years. With the development of information highway, data can be generated and collected very fast, and the data is so large that it has exceeded the limit of our conventional processing methods and applications. The social network is one of many data explosion areas. Among all social network medias, Twitter has become one of the most important platforms to share and communicate with friends. Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document, and the sentiment analysis on Twitter has also been used as a valid indicator of stock prices in the past. Naive Bayes is an algorithm to perform sentiment analysis. MapReduce programming model provides a simple and powerful model to implement distributed applications without having deeper knowledge of parallel programming. When a new hypothetical MapReduce sentiment analysis system is built to provide certain performance goal, we are lack of the benchmark and the traditional trial-and-error solution is extremely time-consuming and costly. In this thesis we implemented a prototype system using Naive Bayes to find the correlation between the geographical sentiment on Twitter and the stock price behavior of companies. Also we implemented the Naive Bayes sentiment analysis algorithm in MapReduce model based on Hadoop, and evaluated the algorithm on large amount of Twitter data with different metrics. Based on the evaluation results, we provided a comprehensive MapReduce performance prediction model for Naive Bayes based sentiment analysis algorithm. The prediction model can predict task execution performance within a window, and can also be used by other MapReduce systems as a benchmark in order to improve the performance.

URI

https://hdl.handle.net/10355/45675

Degree

M.S.

Thesis Department

Computer science (MU)

Collections

2014 MU theses - Freely available online
Computer Science electronic theses and dissertations (MU)

Full item page

Naive Bayes algorithm for Twitter sentiment analysis and its implementation in MapReduce

Files

Authors

Meeting name

Sponsors

Date

Journal Title

Format

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

Table of Contents

URI

DOI

PubMed ID

Degree

Thesis Department

Rights

License

Collections