Design of Multi-modality Deep Fusion Architecture for Deep Acoustic Analytics

Tariq, Zeenat

Tariq, Zeenat

View/Open

Design of Multi-modality Deep Fusion Architecture for Deep Acoustic Analytics (16.55Mb)

Date

2021

Metadata

[+] Show full item record

Abstract

There is increasing attention for audio classification research to support various emerging applications, including environmental monitoring, health care, and smart city. Audio classification is an important area of research that needs more dynamic and be adapted to possible changes in the environment, facilitate the adoption of enhancement techniques with innovative and effective solutions. While there is an increasing amount of audio data available for environmental audio classification, we still face significant challenges to conduct accurately deep learning in the environmental and health audio domain. These challenges may occur due to the various field and categories, e.g., environmental, animal sounds, noises, and human body sounds. Specifically, distortion, fracture, and audio data noise are the primary obstacles affecting the accuracy of environmental audio classification. There have been many advancements in the correct detection of audio sounds; the audio data depending on how the features are extracted from the audio, how modality represents the audio data, and how much noise is present in raw audio. With the rapid increase of environmental datasets, extracting relevant data to create an adequate environmental sound classification is a crucial challenge. Deep learning is an advanced and promising solution to detect, predict and classify different types of sounds. Convolutional neural network is growing most in audio classification using environmental and health audio data. The classification technique distinguishes between different types of sounds by capturing different patterns across time and frequency and applying them to different features. For the neural network model to perform efficiently, the training requires a large amount of data to get better training. Many researchers are working on these ideas today, but the research is still not mature enough due to the lack of available datasets. Moreover, in audio classification domains, noise in sound data or unstructured data may affect the classifier's performance. The data for audio classification is too complicated to understand multiple characteristics and latent patterns of data. We have developed unique fusion architectures based on convolutional neural networks for conducting multi-feature multi-modality fusion-based audio classification to solve these problems in acoustic classification. We have proposed the multi-modality fusion architecture with Deep Acoustics (DA) and Multimodal Deep Acoustics (MDA). The contributions are (1) The performance with multi-modality of the input audio clips was influenced by data issues such as noise or imbalance. (2) We have extracted various acoustic features for the multi-features fusion network, such as Log-Mel Spectrogram, Chromagram, and Mel Frequency Cepstral Coefficient. (3) We have developed effective data augmentation and normalization methods to enhance the quality and comparability of sparse audio samples with various extended features. (4) The proposed models have been evaluated with two types of fusion approaches: multi feature-based model fusion and network-based fusion. Our experimental results validate the benefits of our proposed work for audio classification tasks. Mainly, it is confirmed that our fusion models with multi-feature and multi-modality are very efficient. In numerous benchmark datasets, the suggested models outperformed state-of-the-art solutions, according to our comprehensive testing. The suggested deep acoustic analytics approaches have been used in the environmental sound detection and healthcare domains to detect lung and heart conditions. Furthermore, our models show better results even with small network models (i.e., less convolutional and hidden layers, fewer trainable parameters, a smaller number of epochs, and less time consumption) than the results of previous methods available in the literature.

Introduction -- Deep Acoustic Model -- Multimodality for Deep Acoustics -- Deep Network-based fusion for Deep Acoustics Learning -- Feature-based fusion learning for deep acoustics -- Network Model discussion -- Conclusion and future work