How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra

Published on 2023.04.05 by Yessen Kanapin

How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra header picture

Getting started

First install the deepctl command line tool.

curl https://deepinfra.com/get.sh | sh

Login to DeepInfra (using your GitHub account)

deepctl login

This will take you to the browser to login in DeepInfra using your GitHub account. When you are done, come back to the terminal.

Running speech recognition

Whisper is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English, see more at deepinfra.com. By default, Whisper produces by sentence timestamp segmentation. We also host whisper-timestamped that can provide timestamps for words in the audio. You can use it with either our rest API or our deepctl command line too. Here is how to use it with the command line tool:

deepctl infer -m 'openai/whisper-timestamped-medium.en'
              -i audio=@/home/user/all-in-01.mp3

To see additional parameters and how to call this model checkout out the documentation page or using command line tool:

deepctl model info -m openai/whisper-base

If you have any question, just reach out to us on our Discord server.