Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. Whisper is a Transformer-based encoder-decoder model trained on English-only or multilingual data. The English-only models were trained on speech recognition, while the multilingual models were trained on both speech recognition and machine translation.
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. Whisper is a Transformer-based encoder-decoder model trained on English-only or multilingual data. The English-only models were trained on speech recognition, while the multilingual models were trained on both speech recognition and machine translation.
You can use cURL or any other http client to run inferences:
curl -X POST \
-H "Authorization: bearer $(deepctl auth token)" \
-F audio=@my_voice.mp3 \
'https://api.deepinfra.com/v1/inference/openai/whisper-tiny'
which will give you back something similar to:
{
"text": "",
"segments": [
{
"id": 0,
"text": "Hello",
"start": 0.0,
"end": 1.0
},
{
"id": 1,
"text": "World",
"start": 4.0,
"end": 5.0
}
],
"language": "en",
"input_length_ms": 0,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
condition_on_previous_text
booleanprovide the previous output of the model as a prompt for the next window
Default value: true
temperature_increment_on_fallback
numbertemperature to increase when falling back when the decoding fails to meet either of the thresholds below
Default value: 0.2
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request