Defending deepfake detector against data poisoning attacks
Abstract
With the ability of generating high quality fake images using deep neural networks, fake image detection techniques have become more and more important to serve as the guard that prevents misinformation from spreading online. However, just like other AI models, fake image detectors also face machine learning attacks that could compromise their effectiveness in filtering out fake images. In this work, we focus on defending the data poisoning attacks on DNN-based fake image detectors in which attackers attempt to fool the fake image detectors by mislabeling fake images used for training. We design a novel protector model that is capable of distinguishing such poisoned fake images from correctly labeled images. A key advantage of our model is that it is able to identify new types of poisoned fake images that it has not seen before. We have conducted extensive experimental studies which demonstrate the high detection accuracy and recall of our model.
Degree
M.S.