Siamese Network-Based Multi-Modal Deepfake Detection
Metadata[+] Show full item record
Deep learning widely applies to solve various problems in healthcare, robotics, and computer vision. Presently, an emerging deep learning application called "deepfake" has raised concerns about the multiple types of security threats that may pose severe harm to personal privacy and public safety. Deep convolutional neural networks like VGGNet and InceptionNet have recently set a proposal for detecting deepfake. The main challenge of these CNN-based algorithms is that they require extensive training datasets and high-end GPU resources. Furthermore, these studies mainly focus on identifying patterns in facial expressions in deepfake, and there are only very few studies on detecting audio fakeness. In this thesis, we propose a novel method for uni-modal or multi-modal deepfake detection with minimum resources. The proposed solution was designed with a Siamese network-based deepfake model with invariant of constructive loss and triplet loss. Contrastive loss uses the trained network's output for a positive example and calculates its distance to an instance of the same class and contrasts it with the range to negative samples. The triplet loss was computed by positioning the baseline that minimizes the distance to positive samples but maximizes the distance to negative samples. To test and validate our proposed model, we report our metrics like similarity score, loss, and accuracy on large-scale DFDC, Faceforensic++, and CelebDF datasets. We compared our method with state-of-the-art algorithms and confirmed that our overall accuracy is improved by 2-3% for deepfake detection.
Table of Contents
Introduction -- Background and related work -- Proposed framework -- Results and evaluations -- Conclusion, limitation and future work
M.S. (Master of Science)