Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. Whisper is a Transformer-based encoder-decoder model trained on English-only or multilingual data. The English-only models were trained on speech recognition, while the multilingual models were trained on both speech recognition and machine translation.
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. Whisper is a Transformer-based encoder-decoder model trained on English-only or multilingual data. The English-only models were trained on speech recognition, while the multilingual models were trained on both speech recognition and machine translation.
Please upload an audio file
task to perform 2
language that the audio is in; uses detected language if None. (Default: empty)
temperature to use for sampling (Default: 0)
patience value to use in beam decoding (Default: 1)
token ids to suppress during sampling. (Default: -1)
optional text to provide as a prompt for the first window.. (Default: empty)
provide the previous output of the model as a prompt for the next window 2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below (Default: 0.2)
gzip compression ratio threshold (Default: 2.4)
average log probability threshold (Default: -1)
probability of the <|nospeech|> token threshold (Default: 0.6)
You need to login to use this model