Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!
openai/
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without fine-tuning. The primary intended users of these models are AI researchers studying robustness, generalisation, and capabilities of the current model.
You can use cURL or any other http client to run inferences:
curl -X POST \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-F audio=@my_voice.mp3 \
'https://api.deepinfra.com/v1/inference/openai/whisper-medium.en'
which will give you back something similar to:
{
"text": "",
"segments": [
{
"end": 1.0,
"id": 0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"id": 1,
"start": 4.0,
"text": "World"
}
],
"language": "en",
"input_length_ms": 0,
"words": [
{
"end": 1.0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"start": 4.0,
"text": "World"
}
],
"duration": 0.0,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0,
"output_length": 0
}
}
© 2026 Deep Infra. All rights reserved.