Text mining with neural network and MapReduce
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Increasing data from internet can provide helpful information to support business process such as product development process, inventory management process and quality management process by measuring customers' satisfaction. This source of information is getting important because e-business now plays a bigger role in world commerce. However, most of internet data are unstructured data such as newspaper articles, blogs, and users' comments. They are seen as qualitative resources while business practice prefers analyzing sales volume, production quantity, and inventory number. Recently, with achievement in machine learnings, data scientists begin to exploit unstructured data for useful information. One of these application is doing data mining (text mining) to analyze customers' sentiment from their reviews' text. This research aims to investigate and classify polarity of customer's reviews as positive or negative opinion. While other studies in this field focused on support vector machine method at document level, this research analyzes reviews at sentence level by combination of natural language processing method and neural network classifier. Natural language processing can extract more accurate features from text documents with consideration of syntactical and semantic order at sentence level. Then it summarizes document as reduced dimension features. Neural network classifier can give superior result (Moraes, Valiati, & Neto, 2013), and works well with reduced features. The reduced dimension features are important when the project works with large dataset. The proposed method applies neural network in MapReduce framework which used for parallel programming. This approach has advantage when program works with growing unstructured data on distributed file storage system. The results show that natural language processing method improves classification performance. When this program doubles number of parallel jobs, classification time reduces a half. However, running time of parallel job is only effective if datasets are still large enough after extracting for necessary classification features.
Access to files is limited to the University of Missouri--Columbia.