🚀 New models by Bria.ai, generate and edit images at scale 🚀
openai/
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture. Whisper models are available for various languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, and many more.
You can use cURL or any other http client to run inferences:
curl -X POST \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-F audio=@my_voice.mp3 \
'https://api.deepinfra.com/v1/inference/openai/whisper-base'
which will give you back something similar to:
{
"text": "",
"segments": [
{
"end": 1.0,
"id": 0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"id": 1,
"start": 4.0,
"text": "World"
}
],
"language": "en",
"input_length_ms": 0,
"words": [
{
"end": 1.0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"start": 4.0,
"text": "World"
}
],
"duration": 0.0,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
© 2025 Deep Infra. All rights reserved.