Text mining with neural network and MapReduce

Nguyen, Nguyen Phuoc

URI

https://hdl.handle.net/10355/48652

dc.contributor.advisor	Middlekoop, Timothy	eng
dc.contributor.author	Nguyen, Nguyen Phuoc	eng
dc.date.issued	2015	eng
dc.date.submitted	2015 Summer	eng
dc.description.abstract	[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Increasing data from internet can provide helpful information to support business process such as product development process, inventory management process and quality management process by measuring customers' satisfaction. This source of information is getting important because e-business now plays a bigger role in world commerce. However, most of internet data are unstructured data such as newspaper articles, blogs, and users' comments. They are seen as qualitative resources while business practice prefers analyzing sales volume, production quantity, and inventory number. Recently, with achievement in machine learnings, data scientists begin to exploit unstructured data for useful information. One of these application is doing data mining (text mining) to analyze customers' sentiment from their reviews' text. This research aims to investigate and classify polarity of customer's reviews as positive or negative opinion. While other studies in this field focused on support vector machine method at document level, this research analyzes reviews at sentence level by combination of natural language processing method and neural network classifier. Natural language processing can extract more accurate features from text documents with consideration of syntactical and semantic order at sentence level. Then it summarizes document as reduced dimension features. Neural network classifier can give superior result (Moraes, Valiati, & Neto, 2013), and works well with reduced features. The reduced dimension features are important when the project works with large dataset. The proposed method applies neural network in MapReduce framework which used for parallel programming. This approach has advantage when program works with growing unstructured data on distributed file storage system. The results show that natural language processing method improves classification performance. When this program doubles number of parallel jobs, classification time reduces a half. However, running time of parallel job is only effective if datasets are still large enough after extracting for necessary classification features.	eng
dc.identifier.uri	https://hdl.handle.net/10355/48652
dc.language	English	eng
dc.publisher	University of Missouri--Columbia	eng
dc.relation.ispartofcommunity	University of Missouri--Columbia. Graduate School. Theses and Dissertations	eng
dc.rights	Access to files is limited to the University of Missouri--Columbia.	eng
dc.title	Text mining with neural network and MapReduce	eng
dc.type	Thesis	eng
thesis.degree.discipline	Industrial and manufacturing systems engineering (MU)	eng
thesis.degree.grantor	University of Missouri--Columbia	eng
thesis.degree.level	Masters	eng
thesis.degree.name	M.S.	eng

Files in this item

Name:: public.pdf
Size:: 2.440Kb
Format:: PDF

View/Open

Name:: research.pdf
Size:: 2.528Mb
Format:: PDF

View/Open

Name:: short.pdf
Size:: 135.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

2015 MU theses - Access restricted to MU
Industrial and Manufacturing Systems Engineering electronic theses and dissertations (MU)
The electronic theses and dissertations of the Department of Industrial and Manufacturing Systems Engineering.

[-] Show simple item record