openai/whisper-medium cover image

openai/whisper-medium

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labeled data and demonstrates strong abilities to generalize to various datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture.

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labeled data and demonstrates strong abilities to generalize to various datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture.

Public
$0.0005/sec

Input

Please upload an audio file

task to perform 2

language that the audio is in; uses detected language if None. (Default: empty)

temperature to use for sampling (Default: 0)

patience value to use in beam decoding (Default: 1)

token ids to suppress during sampling. (Default: -1)

optional text to provide as a prompt for the first window.. (Default: empty)

provide the previous output of the model as a prompt for the next window 2

temperature to increase when falling back when the decoding fails to meet either of the thresholds below (Default: 0.2)

gzip compression ratio threshold (Default: 2.4)

average log probability threshold (Default: -1)

probability of the <|nospeech|> token threshold (Default: 0.6)

You need to login to use this model

Output

 


© 2023 Deep Infra. All rights reserved.

Discord Logo