Famous Figures Dataset

Famous Figures: A Deepfake Speech Dataset of Public Personalities

Authors: Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik

Conference: Interspeech 2025

About

The Famous Figures dataset is a curated collection of real and synthetic speech from well-known public figures including politicians, actors, and activists. This dataset aims to benchmark deepfake detection models under realistic threat conditions where target voices are publicly available.

Dataset Highlights
Speakers

Anthony Blinken, Barack Obama, Donald Trump, JD Vance, Joe Biden, Kamala Harris, Mathew Miller, Tim Walz, Vivek Ramaswamy, and Elon Musk

TTS Systems

StyleTTS2, XTTSv2, F5TTS, E2TTS, FishSpeech, SSRSpeech, MaskGCT, CozyVoice2, LLASA, and Zonosv0.1

Resources
How to cite:
@inproceedings{ali25_interspeech,
  title     = {{Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges}},
  author    = {Hashim Ali and Surya Subramani and Raksha Varahamurthy and Nithin Adupa and Lekha Bollinani and Hafiz Malik},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {3928--3932},
  doi       = {10.21437/Interspeech.2025-2418},
  issn      = {2958-1796},
}