Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.
Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.
You can POST to our OpenAI Transcriptions and Translations compatible endpoint.
For a given audio file and model, the endpoint will return the transcription object or a verbose transcription object.
flac
, mp3
, mp4
, mpeg
, mpga
, m4a
, ogg
, wav
, and webm
.distil-whisper/distil-large-v3
for this case. For other models, refer to models/automatic-speech-recognition.json
(default), text
, srt
, verbose_json
, vtt
.response_format
to be set to verbose_json
. Options: word
- generates timestamps for individual words, segment
- generates timestamps for segments. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.The transcription object or a verbose transcription object.
curl "https://api.deepinfra.com/v1/openai/audio/transcriptions" \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-F file="@/path/to/file/audio.mp3" \
-F model="distil-whisper/distil-large-v3"
curl "https://api.deepinfra.com/v1/openai/audio/transcriptions" \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-F file="@/path/to/file/audio.mp3" \
-F model="distil-whisper/distil-large-v3" \
-F response_format="verbose_json" \
-F "timestamp_granularities[]=word"
curl "https://api.deepinfra.com/v1/openai/audio/transcriptions" \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-F file="@/path/to/file/audio.mp3" \
-F model="distil-whisper/distil-large-v3" \
-F response_format="verbose_json" \
-F "timestamp_granularities[]=segment"
For a given audio file and model, the endpoint will return the translated text to English.
flac
, mp3
, mp4
, mpeg
, mpga
, m4a
, ogg
, wav
, and webm
.distil-whisper/distil-large-v3
for this case. For other models, refer to models/automatic-speech-recognition.json
(default), text
, srt
, verbose_json
, vtt
.The translated text to English.
curl "https://api.deepinfra.com/v1/openai/audio/translations" \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-F file="@/path/to/file/german.m4a" \
-F model="distil-whisper/distil-large-v3"
response_format
stringThe format of the output
Default value: "json"
Allowed values: json
verbose_json
text
srt
vtt
temperature
numberThe sampling temperature, between 0 and 1. Higher values produce more creative results.
Default value: 0
Range: 0 ≤ temperature ≤ 1
timestamp_granularities
arrayAn array specifying the granularity of timestamps to include in the transcription. Possible values are 'segment', 'word'.