Shared Context through Multi-Level Attention Transformers for Text Classification
Metadata[+] Show full item record
Natural language processing (NLP) has seen recent explosive growth by creating artificial intelligence with human-level intelligence. Understanding the context using an attention mechanism could be further improved by fine-tuning their composition for classification, question answering, and topic modeling. Real-world datasets are much more complex and tend to require multi-fold models. Such models tend to be larger, deeper, more complicated; for example, BERT has 340 million parameters, Turing NLG is 17billion parameters, and GPT-3 is about 175 billion parameters. Understanding their implications requires the immense computational ability to process the text corpus during both training and inferences. This thesis proposes a novel deep learning architecture for scalable multi-fold text classification that is an extension of BERT by sharing context across abstraction levels of domains. Four types of deep learning models (BERT flat, BERT hierarchical, BERT hierarchical tuned, BERT Feature extracted) are proposed for the multi-label attention transformers on the architecture. The proposed models provide a means to overcome competing limitations, training concurrently, and providing predictions for an extra level of classes simultaneously. Our work overcomes the limitations of knowledge distillation or transfer-learning, i.e., it is not scalable or sustainable, and it’s also costly. We have performed experiments to validate the reliability model using both benchmark and real-world data (KCMO 311 data). Quantitative results confirm that the proposed models can enhance model performance in terms of computational requirements and provide competitive accuracy.
Table of Contents
Introduction -- Related work -- Multi-Level Attention Transformers -- Evaluation and Results -- Conclusion and Future Work
M.S. (Master of Science)