Audio Deepfake Detection
Response schema and example outputs for AI-ForensiX audio deepfake detection model.
AI-ForensiX Audio Deepfake Detection model evaluates acoustic, frequency, and speech patterns to determine whether an audio clip is real, synthetic, or manipulated.
The model provides prediction labels, confidence scores, explainability heatmaps, and detailed manipulation type classification.
Response Schema
AudioDeepfakeDetectionResult
| Field | Type | Description |
|---|---|---|
label | string ("real" | "fake") | Indicates whether the audio is authentic or manipulated/synthetic. |
score | number (0.0 – 1.0) | Confidence score of the prediction based on acoustic analysis. |
heatmap_url | string (URL) - optional | URL to spectrogram-based heatmap highlighting regions contributing to the decision. |
source | string ("real" | "replay" | "tts" | "voice clone" | "voice conversion" | "post processing") | Identifies the type of manipulation or authenticity. |
Source Classification Explanation
| Value | Meaning |
|---|---|
| real | Audio captured from a genuine human speaker. |
| replay | Replay attack detected (re-recorded authentic audio). |
| tts | Text-to-speech synthesized audio. |
| voice clone | AI-generated cloned voice mimicking a specific speaker. |
| voice conversion | Speaker identity transformed using voice conversion techniques. |
| post processing | Edited, spliced, or digitally manipulated audio. |
Example Responses
Listing 5: Fake Audio Detection Example
{
"label": "fake",
"score": 0.912,
"heatmap_url": "https:https://forensiX.com/.png,
"source": "voice_clone"
}Listing 5: Real Audio Detection Example
{
"label": "real",
"score": 0.987,
"source": "real"
}