🚀 New models by Bria.ai, generate and edit images at scale 🚀
openai/
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.
You can use cURL or any other http client to run inferences:
curl -X POST \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-F audio=@my_voice.mp3 \
'https://api.deepinfra.com/v1/inference/openai/whisper-tiny.en'
which will give you back something similar to:
{
"text": "",
"segments": [
{
"end": 1.0,
"id": 0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"id": 1,
"start": 4.0,
"text": "World"
}
],
"language": "en",
"input_length_ms": 0,
"words": [
{
"end": 1.0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"start": 4.0,
"text": "World"
}
],
"duration": 0.0,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
© 2025 Deep Infra. All rights reserved.