[-] Show simple item record

dc.contributor.advisorLee, Yugyung, 1960-
dc.contributor.authorNagireddy, Srichakradhar Reddy
dc.date.issued2021
dc.date.submitted2021 Fall
dc.descriptionTitle from PDF of title page viewed January 19, 2022
dc.descriptionThesis advisor: Yugyung Lee
dc.descriptionVita
dc.descriptionIncludes bibliographical references (page 149-164)
dc.descriptionThesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021
dc.description.abstractStories are a powerful medium through which the human community has exchanged information since the dawn of the information age. They have taken multiple forms like articles, movies, books, plays, short films, magazines, mythologies, etc. With the ever-growing complexity of information representation, exchange, and interaction, it became highly important to find ways that convey the stories more effectively. With a world that is diverging more and more, it is harder to draw parallels and connect the information from all around the globe. Even though there have been efforts to consolidate the information on a large scale like Wikipedia, Wiki Data, etc, they are devoid of any real-time happenings. With the recent advances in Natural Language Processing (NLP), we propose a framework to connect these stories together making it easier to find the links between them thereby helping us understand and explore the links between the stories and possibilities that revolve around them. Our framework is based on the 5W + 1H (What, Who, Where, When, Why, and How) format that represents stories in a format that is both easily understandable by humans and accurately generated by the deep learning models. We have used 311 calls and cyber security datasets as case studies for which a few NLP techniques like classification, Topic Modelling, Question Answering, and Question Generation were used along with the 5W1H framework to segregate the stories into clusters. This is a generic framework and can be used to apply to any field. We have evaluated two approaches for generating results - training-based and rule-based. For the rule-based approach, we used Stanford NLP parsers to identify patterns for the 5W + 1H terms, and for the training based approach, BERT embeddings were used and both were compared using an ensemble score (average of CoLA, SST-2, MRPC, QQP, STS-B, MNLI, QNLI, and RTE) along with BLEU and ROUGE scores. A few approaches are studied for training-based analysis - using BERT, Roberta, XLNet, ALBERT, ELECTRA, and AllenNLP Transformer QA with the datasets - CVE, NVD, SQuAD v1.1, and SQuAD v2.0, and compared them with custom annotations for identifying 5W + 1H. We've presented the performance and accuracy of both approaches in the results section. Our method gave a boost in the score from 30% (baseline) to 91% when trained on the 5W+1H annotations.
dc.description.tableofcontentsIntroduction -- Related work -- The 5W1H Framework and the models included -- StoryNet Application: Evaluation and Results -- Conclusion and Future Work
dc.format.extentxii, 166 pages
dc.identifier.urihttps://hdl.handle.net/10355/88647
dc.subject.lcshNatural language processing (Computer science)
dc.subject.lcshQuestion-answering systems
dc.subject.otherThesis -- University of Missouri--Kansas City -- Computer science
dc.titleStoryNet: A 5W1H-based knowledge graph to connect stories
thesis.degree.disciplineComputer Science (UMKC)
thesis.degree.grantorUniversity of Missouri--Kansas City
thesis.degree.levelMasters
thesis.degree.nameM.S. (Master of Science)


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record