Active labeling in deep learning and its application to emotion prediction
Metadata[+] Show full item record
Recent breakthroughs in deep learning have made possible the learning of deep layered hierarchical representations of sensory input. Stacked restricted Boltzmann machines (RBMs), also called deep belief networks (DBNs), and stacked autoencoders are two representative deep learning methods. The key idea is greedy layer-wise unsupervised pre-training followed by supervised fine-tuning, which can be done efficiently and overcomes the difficulty of local minima when training all layers of a deep neural network at once. Deep learning has been shown to achieve outstanding performance in a number of challenging real-world applications. Existing deep learning methods involve a large number of meta-parameters, such as the number of hidden layers, the number of hidden nodes, the sparsity target, the initial values of weights, the type of units, the learning rate, etc. Existing applications usually do not explain why the decisions were made and how changes would affect performance. Thus, it is difficult for a novice user to make good decisions for a new application in order to achieve good performance. In addition, most of the existing works are done on simple and clean datasets and assume a fixed set of labeled data, which is not necessarily true for real-world applications. The main objectives of this dissertation are to investigate the optimal meta-parameters of deep learning networks as well as the effects of various data pre-processing techniques, propose a new active labeling framework for cost-effective selection of labeled data, and apply deep learning to a real-world application--emotion prediction via physiological sensor data, based on real-world, complex, noisy, and heterogeneous sensor data. For meta-parameters and data pre-processing techniques, this study uses the benchmark MNIST digit recognition image dataset and a sleep-stage-recognition sensor dataset and empirically compares the deep network's performance with a number of different meta-parameters and decisions, including raw data vs. pre-processed data by Principal Component Analysis (PCA) with or without whitening, various structures in terms of the number of layers and the number of nodes in each layer, stacked RBMs vs. stacked autoencoders. For active labeling, a new framework for both stacked RBMs and stacked autoencoders is proposed based on three metrics: least confidence, margin sampling, and entropy. On the MINIST dataset, the methods outperform random labeling consistently by a significant margin. On the other hand, the proposed active labeling methods perform similarly to random labeling on the sleep-stage-recognition dataset due to the noisiness and inconsistency in the data. For the application of deep learning to emotion prediction via physiological sensor data, a software pipeline has been developed. The system first extracts features from the raw data of four channels in an unsupervised fashion and then builds three classifiers to classify the levels of arousal, valence, and liking based on the learned features. The classification accuracy is 0.609, 0.512, and 0.684, respectively, which is comparable with existing methods based on expert designed features.