openai/whisper-timestamped-medium cover image

openai/whisper-timestamped-medium

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

Public

Input

Please upload an audio file

task to perform 2

optional text to provide as a prompt for the first window.. (Default: empty)

temperature to use for sampling (Default: 0)

language that the audio is in; uses detected language if None. (Default: empty)

You need to login to use this model

Output