distil-whisper/distil-large-v3 cover image


Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.

Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.

$0.00018 / minute


You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -F audio=@my_voice.mp3  \

which will give you back something similar to:

  "text": "",
  "segments": [
      "id": 0,
      "text": "Hello",
      "start": 0.0,
      "end": 1.0
      "id": 1,
      "text": "World",
      "start": 4.0,
      "end": 5.0
  "language": "en",
  "input_length_ms": 0,
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0

Input fields


audio to transcribe


task to perform

Default value: "transcribe"

Allowed values: transcribetranslate


optional text to provide as a prompt for the first window.


temperature to use for sampling

Default value: 0


language that the audio is in; uses detected language if None; use two letter language code (ISO 639-1) (e.g. en, de, ja)


The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema