Predicting stock price using sentiment analysis combining Twitter, search engine and investor intelligence data
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] The stock markets in the recent years have become an integral part of the global economy, any fluctuation in this market influences our personal and corporate financial lives. A good prediction model for stock market forecasting is always highly desirable and would of wider interest. Recent research suggests that very early indicators can be extracted from online social media (blogs, Twitter feeds, etc.) to predict changes in various economic and commercial indicators. In this project, daily sentiment features are generated from a Twitter dataset to build up a high accuracy prediction model for stock price movement. Google Search Queries and Investor Intelligence provide additional features to improve performance on weekly-based models. Five sentiment features (Mt-Positive, Mt-Negative, Bullishness, Message Volume, Agreement) are extracted from Twitter using sentiment analysis. Tweets that can express opinion upon stocks or indices are filtered out and classified from a Twitter dataset, which holds more than 400 million records from July 31 to December 31 2009. Four finance features (Return, Close, Trade Volume, Volatility) are generated for 2 Market Indices NASDAQ-100, Dow Jones Average Indices and 13 leading technological companies. Second step, correlations on each finance features with all other features are calculated to verify their statistically relationships. Results show high correlations (up to 0.93 for DJIA with Close) with stock prices and twitter sentiment. Twitter Sentiment may have time delay on stock prices movement, so time lag by weeks are also included in this experiments. Furthermore, with confidence from the correlations, several Machine Learning algorithms like Gaussian Process, Neural Network and Decision Stump are applied on the feature set. Results show reliable models are built with strong correlations and low Root Mean Square Error (R: 0.94, RMSE: 0.065). Finally, a real time prediction system is built with an additional component of Twitter Streaming API for collecting real time Twitter data. Overall, the experimental results show that this prediction system is working with satisfiable efficiency and accuracy.
Access is limited to the campuses of the University of Missouri.