Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media
Published in Global Youth Science Journal
Deepfakes pose serious risks to public trust and information integrity; we tested whether publicly available detection tools reliably identify viral real-world deepfakes. We hypothesized that off-the-shelf detectors would show inconsistent accuracy and produce both false positives and false negatives when applied to in-the-wild videos. To test this, we evaluated 20 viral clips (10 confirmed deepfakes, 10 authentic controls) using two public detection platforms and recorded ensemble and per-model likelihoods across more than ten detectors. Results revealed substantial cross-platform disagreement: one platform's ensemble flagged only a minority of confirmed deepfakes while the research platform produced extreme per-model score variance, so that sensitivity depended strongly on how an intermediate "Suspicious" label was treated. Depending on the binary mapping used, measured sensitivity varied widely while specificity remained high for this sample. We conclude that current public detectors provide useful signals but are not yet reliable as sole arbiters of authenticity for viral content; we recommend publishing full per-video numeric outputs, versioned model identifiers, and pairing automated screening with human expert review.
Recommended citation (MLA): Sarkar, T. Benchmarking and Cross-platform Evaluation of Public Deepfake Detection Models on Viral Real-world Media. Global Youth Science Journal, Oct. 2025, https://doi.org/10.5281/zenodo.17250129.