StoryNet: A 5W1H-based knowledge graph to connect stories
Metadata[+] Show full item record
Stories are a powerful medium through which the human community has exchanged information since the dawn of the information age. They have taken multiple forms like articles, movies, books, plays, short films, magazines, mythologies, etc. With the ever-growing complexity of information representation, exchange, and interaction, it became highly important to find ways that convey the stories more effectively. With a world that is diverging more and more, it is harder to draw parallels and connect the information from all around the globe. Even though there have been efforts to consolidate the information on a large scale like Wikipedia, Wiki Data, etc, they are devoid of any real-time happenings. With the recent advances in Natural Language Processing (NLP), we propose a framework to connect these stories together making it easier to find the links between them thereby helping us understand and explore the links between the stories and possibilities that revolve around them. Our framework is based on the 5W + 1H (What, Who, Where, When, Why, and How) format that represents stories in a format that is both easily understandable by humans and accurately generated by the deep learning models. We have used 311 calls and cyber security datasets as case studies for which a few NLP techniques like classification, Topic Modelling, Question Answering, and Question Generation were used along with the 5W1H framework to segregate the stories into clusters. This is a generic framework and can be used to apply to any field. We have evaluated two approaches for generating results - training-based and rule-based. For the rule-based approach, we used Stanford NLP parsers to identify patterns for the 5W + 1H terms, and for the training based approach, BERT embeddings were used and both were compared using an ensemble score (average of CoLA, SST-2, MRPC, QQP, STS-B, MNLI, QNLI, and RTE) along with BLEU and ROUGE scores. A few approaches are studied for training-based analysis - using BERT, Roberta, XLNet, ALBERT, ELECTRA, and AllenNLP Transformer QA with the datasets - CVE, NVD, SQuAD v1.1, and SQuAD v2.0, and compared them with custom annotations for identifying 5W + 1H. We've presented the performance and accuracy of both approaches in the results section. Our method gave a boost in the score from 30% (baseline) to 91% when trained on the 5W+1H annotations.
Table of Contents
Introduction -- Related work -- The 5W1H Framework and the models included -- StoryNet Application: Evaluation and Results -- Conclusion and Future Work
M.S. (Master of Science)