Famous Figures: Deepfake Speech Dataset

Famous Figures: A Deepfake Speech Dataset of Public Personalities

Authors: Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik

Conference: Interspeech 2025

About

The Famous Figures dataset is a curated collection of real and synthetic speech from well-known public figures including politicians, actors, and activists. This dataset aims to benchmark deepfake detection models under realistic threat conditions where target voices are publicly available.

Dataset Highlights

10 public figures
10 TTS systems used for synthesis
1000 hours of bonafide speech data
10,000 hours of spoof data
Designed for protecting public figures from voice cloning attacks

Speakers

Anthony Blinken, Barack Obama, Donald Trump, JD Vance, Joe Biden, Kamala Harris, Mathew Miller, Tim Walz, Vivek Ramaswamy, and Elon Musk

TTS Systems

StyleTTS2, XTTSv2, F5TTS, E2TTS, FishSpeech, SSRSpeech, MaskGCT, CozyVoice2, LLASA, and Zonosv0.1

Resources

For Dataset: Access Dataset on Hugging Face

How to cite:

@inproceedings{ali25_interspeech,
  title     = {{Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges}},
  author    = {Hashim Ali and Surya Subramani and Raksha Varahamurthy and Nithin Adupa and Lekha Bollinani and Hafiz Malik},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {3928--3932},
  doi       = {10.21437/Interspeech.2025-2418},
  issn      = {2958-1796},
}