Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.
Please upload an audio file
task to perform 2
language that the audio is in; uses detected language if None. (Default: empty)
temperature to use for sampling (Default: 0)
patience value to use in beam decoding (Default: 1)
token ids to suppress during sampling. (Default: -1)
optional text to provide as a prompt for the first window.. (Default: empty)
provide the previous output of the model as a prompt for the next window 2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below (Default: 0.2)
gzip compression ratio threshold (Default: 2.4)
average log probability threshold (Default: -1)
probability of the <|nospeech|> token threshold (Default: 0.6)
You need to login to use this model