openai/whisper-timestamped-medium cover image

openai/whisper-timestamped-medium

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

Public
$0.0005/sec

HTTP/cURL API

 

Input fields

audiostring

audio to transcribe


taskstring

task to perform

Default value: transcribe

Allowed values: transcribetranslate


languagestring

language that the audio is in; uses detected language if None


temperaturenumber

temperature to use for sampling

Default value: 0


patiencenumber

patience value to use in beam decoding

Default value: 1


suppress_tokensstring

token ids to suppress during sampling

Default value: -1


initial_promptstring

optional text to provide as a prompt for the first window.


condition_on_previous_textboolean

provide the previous output of the model as a prompt for the next window

Default value: true


temperature_increment_on_fallbacknumber

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default value: 0.2


compression_ratio_thresholdnumber

gzip compression ratio threshold

Default value: 2.4


logprob_thresholdnumber

average log probability threshold

Default value: -1


no_speech_thresholdnumber

probability of the <|nospeech|> token threshold

Default value: 0.6


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema


© 2023 Deep Infra. All rights reserved.

Discord Logo