openai/whisper-base cover image

openai/whisper-base

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture. Whisper models are available for various languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, and many more.

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture. Whisper models are available for various languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, and many more.

Public
$0.0005/sec

Input

Please upload an audio file

task to perform 2

language that the audio is in; uses detected language if None. (Default: empty)

temperature to use for sampling (Default: 0)

patience value to use in beam decoding (Default: 1)

token ids to suppress during sampling. (Default: -1)

optional text to provide as a prompt for the first window.. (Default: empty)

provide the previous output of the model as a prompt for the next window 2

temperature to increase when falling back when the decoding fails to meet either of the thresholds below (Default: 0.2)

gzip compression ratio threshold (Default: 2.4)

average log probability threshold (Default: -1)

probability of the <|nospeech|> token threshold (Default: 0.6)

You need to login to use this model

Output

 


© 2023 Deep Infra. All rights reserved.

Discord Logo