[-] Show simple item record

dc.contributor.advisorRao, Praveen R.eng
dc.contributor.authorWang, Lieng
dc.date.issued2015eng
dc.date.submitted2015 Springeng
dc.descriptionTitle from PDF of title page, viewed on July 31, 2015eng
dc.descriptionThesis advisor: Praveen R. Raoeng
dc.descriptionVitaeng
dc.descriptionIncludes bibliographic references (pages 54-58)eng
dc.descriptionThesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015eng
dc.description.abstractThere is an increasing amount of healthcare related data available on Twitter. Due to Twitter’s popularity, every day large amount of clinical tweets are posted on this microblogging service platform. One interesting problem we face today is the classification of clinical tweets so that the classified tweets can be readily consumed by new healthcare applications. While there are several tools available to classify small datasets, the size of Twitter data demands new tools and techniques for fast and accurate classification. Motivated by these reasons, we propose a new tool called Clinical Tweets Classifier (CTC) to enable scalable classification of clinical content on Twitter. CTC uses Apache Mahout, and in addition to keywords and hashtags in the tweets, it also leverages the SNOMED CT clinical terminology and a new tweet influence scoring scheme to construct high accuracy models for classification. CTC uses the Naïve Bayes algorithm. We trained four models based on different feature sets such as hashtags, keywords, clinical terms from SNOMED CT, and so on. We selected the training and test datasets based on the influence score of the tweets. We validated the accuracy of these models using a large number of tweets. Our results show that using SNOMET CT terms and a training dataset with more influential tweets, yields the most accurate model for classification. We also tested the scalability of CTC using 100 million tweets in a small cluster.eng
dc.description.tableofcontentsIntroduction -- Background and related work -- Design and framework -- Evaluation -- Conclusion and future workeng
dc.format.extentxi, 59 pageseng
dc.identifier.urihttps://hdl.handle.net/10355/46336eng
dc.subject.lcshTwittereng
dc.subject.lcshMedical telematicseng
dc.subject.lcshData mining -- Computer programseng
dc.subject.lcshData mining -- Softwareeng
dc.subject.otherThesis -- University of Missouri--Kansas City -- Computer scienceeng
dc.titleClassification of Clinical Tweets Using Apache Mahouteng
dc.typeThesiseng
thesis.degree.disciplineComputer Science (UMKC)eng
thesis.degree.grantorUniversity of Missouri--Kansas Cityeng
thesis.degree.levelMasterseng
thesis.degree.nameM.S.eng


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record