Audio Deepfake Detection

Response schema and example outputs for AI-ForensiX audio deepfake detection model.

AI-ForensiX Audio Deepfake Detection model evaluates acoustic, frequency, and speech patterns to determine whether an audio clip is real, synthetic, or manipulated.
The model provides prediction labels, confidence scores, explainability heatmaps, and detailed manipulation type classification.

Response Schema

AudioDeepfakeDetectionResult

Field	Type	Description
`label`	string (`"real"` \| `"fake"`)	Indicates whether the audio is authentic or manipulated/synthetic.
`score`	number (0.0 – 1.0)	Confidence score of the prediction based on acoustic analysis.
`heatmap_url`	string (URL) - optional	URL to spectrogram-based heatmap highlighting regions contributing to the decision.
`source`	string (`"real"` \| `"replay"` \| `"tts"` \| `"voice clone"` \| `"voice conversion"` \| `"post processing"`)	Identifies the type of manipulation or authenticity.

Source Classification Explanation

Value	Meaning
real	Audio captured from a genuine human speaker.
replay	Replay attack detected (re-recorded authentic audio).
tts	Text-to-speech synthesized audio.
voice clone	AI-generated cloned voice mimicking a specific speaker.
voice conversion	Speaker identity transformed using voice conversion techniques.
post processing	Edited, spliced, or digitally manipulated audio.

Example Responses

Listing 5: Fake Audio Detection Example

{
  "label": "fake",
  "score": 0.912,
  "heatmap_url": "https:https://forensiX.com/.png,
  "source": "voice_clone"
}

Listing 5: Real Audio Detection Example

{
  "label": "real",
  "score": 0.987,
  "source": "real"
}