DeepSampling: Image Sampling Technique for Cost-Effective Deep Learning
Metadata[+] Show full item record
Deep learning is beneficial from big data while facing computationally expensive, with an increase in data size. Some severe data issues, such as the presence of highly skewed, sparse, and imbalanced data, would substantially influence the findings of machine learning. Due to the complexity of such data, the ability to assess and evaluate the data is central to cost-effective deep learning. More specifically, in Deep Learning, choosing the right validation method is vital to ensure the accuracy and biases of the validation process. Current validation techniques, including k-fold cross-validation or random split of training and testing datasets, are hampered by the lack of systematic sampling with a comprehensive understanding of the data. In this thesis, we proposed a sampling technique called DeepSampling that aims at achieving cost-effective deep learning for a given application. For the proposed DeepSampling framework, two sampling schemes are designed  to resolve the imbalanced data issues using Generative Adversarial Networks (GANs),  to develop an effective sampling technique based on clustering. The clustering techniques are based on Mahalanobis distance metric and use t-SNE (T-distributed Stochastic Neighbor Embedding), to overcome the data skewness and sparseness issues. The proposed DeepSampling technique for cost-effective deep learning has been evaluated with three Deep Learning models and four benchmark datasets, including MNIST, Breast Histology, Malaria cell images, and Stanford dog. The results confirm that the accuracies obtained by DeepSampling are improved by approximately 2-3% for image classification, compared to traditional evaluation techniques on the same dataset.
Table of Contents
Introduction -- Background and related work -- Proposed framework -- Results and evaluations -- Conclusion and future work
M.S. (Master of Science)