openai/whisper-large cover image
featured

openai/whisper-large

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Public
$0.0005 / sec

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $(deepctl auth token)"  \
    -F audio=@my_voice.mp3  \
    'https://api.deepinfra.com/v1/inference/openai/whisper-large'

which will give you back something similar to:

{
  "text": "",
  "segments": [
    {
      "id": 0,
      "text": "Hello",
      "start": 0.0,
      "end": 1.0
    },
    {
      "id": 1,
      "text": "World",
      "start": 4.0,
      "end": 5.0
    }
  ],
  "language": "en",
  "input_length_ms": 0,
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

audiostring

audio to transcribe


taskstring

task to perform

Default value: transcribe

Allowed values: transcribetranslate


languagestring

language that the audio is in; uses detected language if None


temperaturenumber

temperature to use for sampling

Default value: 0


patiencenumber

patience value to use in beam decoding

Default value: 1


suppress_tokensstring

token ids to suppress during sampling

Default value: -1


initial_promptstring

optional text to provide as a prompt for the first window.


condition_on_previous_textboolean

provide the previous output of the model as a prompt for the next window

Default value: true


temperature_increment_on_fallbacknumber

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default value: 0.2


compression_ratio_thresholdnumber

gzip compression ratio threshold

Default value: 2.4


logprob_thresholdnumber

average log probability threshold

Default value: -1


no_speech_thresholdnumber

probability of the <|nospeech|> token threshold

Default value: 0.6


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema