dc.contributor.advisor | Zeng, Wenjun, 1967- | eng |
dc.contributor.author | Shah, Abhishek | eng |
dc.date.issued | 2015 | eng |
dc.date.submitted | 2015 Fall | eng |
dc.description.abstract | Twitter scales 500 million tweets per day and has 316 million monthly active users. The majority of tweets are in the form of natural language. Using natural language makes it difficult to understand Twitter's data programmatically. In our research, we attempt to solve this challenge using various machine learning techniques. This thesis includes a new approach for classifying Twitter trends by adding a layer of feature selection and feature ranking. A variety of feature ranking algorithms, such as TF-IDF and bag-of-words, are used to facilitate the feature selection process. This helps in surfacing the important features, while reducing the feature space and making the classification process more efficient. Four Na�ve Bayes text classifiers (one for each class), backed by these sophisticated feature ranking and feature selection techniques, are used to successfully categorize Twitter trends. Using the bag-of-words and TF-IDF rankings, our research provides an average class precision improvement, over the current methodologies, of 33.14% and 28.67% correspondingly | eng |
dc.identifier.uri | https://hdl.handle.net/10355/48617 | |
dc.language | English | eng |
dc.publisher | University of Missouri--Columbia | eng |
dc.relation.ispartofcommunity | University of Missouri--Columbia. Graduate School. Theses and Dissertations | eng |
dc.source | Submitted to MOspace by University of Missouri--Columbia Graduate Studies. | eng |
dc.title | Classification of twitter trends using feature ranking and forward feature selection | eng |
dc.type | Thesis | eng |
thesis.degree.discipline | Computer science (MU) | eng |
thesis.degree.grantor | University of Missouri--Columbia | eng |
thesis.degree.level | Masters | eng |
thesis.degree.name | M.S. | eng |