
Famous Figures: A Deepfake Speech Dataset of Public Personalities
Authors: Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik
Conference: Interspeech 2025
About
The Famous Figures dataset is a curated collection of real and synthetic speech from well-known public figures including politicians, actors, and activists. This dataset aims to benchmark deepfake detection models under realistic threat conditions where target voices are publicly available.
Dataset Highlights
- 10 public figures
- 10 TTS systems used for synthesis
- 1000 hours of bonafide speech data
- 10,000 hours of spoof data
- Designed for protecting public figures from voice cloning attacks
Speakers
Anthony Blinken, Barack Obama, Donald Trump, JD Vance, Joe Biden, Kamala Harris, Mathew Miller, Tim Walz, Vivek Ramaswamy, and Elon Musk
TTS Systems
StyleTTS2, XTTSv2, F5TTS, E2TTS, FishSpeech, SSRSpeech, MaskGCT, CozyVoice2, LLASA, and Zonosv0.1
Resources
- For Dataset: Access Dataset on Hugging Face
How to cite:
@inproceedings{ali25_interspeech, title = {{Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges}}, author = {Hashim Ali and Surya Subramani and Raksha Varahamurthy and Nithin Adupa and Lekha Bollinani and Hafiz Malik}, year = {2025}, booktitle = {{Interspeech 2025}}, pages = {3928--3932}, doi = {10.21437/Interspeech.2025-2418}, issn = {2958-1796}, }