AudioCNN: Audio Event Classification With Deep Learning  Based Multi-Channel Fusion Networks

Chilukuri, Nagababu

Chilukuri, Nagababu

View/Open

AudioCNN: Audio Event Classification With Deep Learning Based Multi-Channel Fusion Networks (1.827Mb)

Date

2020

Metadata

[+] Show full item record

Abstract

In recent years, there is growing interest in environmental sound classification with a plethora of real-world applications, especially in audio fields like speech and music. Recent research works have proven spectral images based on deep learning models for better performance than standard methods. This thesis intends to design a fusion system by combining various audio features, including Spectrogram (SG), Chromagram (CG), and Mel Frequency Cepstral Coefficient (MFCC), for useful environmental sound classification. We propose the AudioCNN model based on a fusion network consisting of multiple Convolutional Neural Networks (CNN) with aggregation methods for various spectral image spectrogram features and audio-specific data augmentation techniques. We have conducted our extensive experiments with benchmark datasets, including Urbansound8k, ESC-50, and ESC-10, emotion datasets. We have obtained state-of-the-art results by outperforming the previous solutions. The experiment results show that combined features with lighter network CNN models outperform baseline environmental sound classification methods. The proposed Multi-Channel fusion network with data augmentation achieved competitive results on UrbanSound8K datasets compared to existing models.

Introduction -- Background -- Related work -- Methodology -- Results and evaluation -- Conclusion

URI

https://hdl.handle.net/10355/80791

Degree

M.S. (Master of Science)

Thesis Department

Computer Science (UMKC)