AudioCNN: Audio Event Classification With Deep Learning Based Multi-Channel Fusion Networks
Metadata[+] Show full item record
In recent years, there is growing interest in environmental sound classification with a plethora of real-world applications, especially in audio fields like speech and music. Recent research works have proven spectral images based on deep learning models for better performance than standard methods. This thesis intends to design a fusion system by combining various audio features, including Spectrogram (SG), Chromagram (CG), and Mel Frequency Cepstral Coefficient (MFCC), for useful environmental sound classification. We propose the AudioCNN model based on a fusion network consisting of multiple Convolutional Neural Networks (CNN) with aggregation methods for various spectral image spectrogram features and audio-specific data augmentation techniques. We have conducted our extensive experiments with benchmark datasets, including Urbansound8k, ESC-50, and ESC-10, emotion datasets. We have obtained state-of-the-art results by outperforming the previous solutions. The experiment results show that combined features with lighter network CNN models outperform baseline environmental sound classification methods. The proposed Multi-Channel fusion network with data augmentation achieved competitive results on UrbanSound8K datasets compared to existing models.
Table of Contents
Introduction -- Background -- Related work -- Methodology -- Results and evaluation -- Conclusion
M.S. (Master of Science)