DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
openai/
$0.00020
/ minute
Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

Please upload an audio file
You need to log in to use this model
Log InSettings
ServiceTier
The service tier used for processing the request. When set to 'priority', the request will be processed with higher priority (only applies to models that support it).
Task
task to perform
Initial Prompt
optional text to provide as a prompt for the first window.. (Default: empty)
Temperature
temperature to use for sampling (Default: 0)
Language
language that the audio is in; uses detected language if None; use two letter language code (ISO 639-1) (e.g. en, de, ja)
Chunk Level
chunk level, either 'segment' or 'word'
Chunk Length S
chunk length in seconds to split audio (Default: 30, 1 ≤ chunk_length_s ≤ 30)
© 2026 DeepInfra. All rights reserved.