We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra
Published on 2023.04.05 by Yessen Kanapin
How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra

Getting started

To use DeepInfra's API, you'll need an API key.

  1. Sign up or log in to your DeepInfra account
  2. Navigate to the Dashboard / API Keys section
  3. Create a new API key if you don't have one already

You'll use this API key in your requests to authenticate with our services.

Running speech recognition

Whisper is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English, see more at deepinfra.com. By default, Whisper produces by sentence timestamp segmentation. We also host whisper-timestamped that can provide timestamps for words in the audio. You can use it with our REST API. Here's how to use it:

curl -X POST \
  -F "audio=@/home/user/all-in-01.mp3" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  'https://api.deepinfra.com/v1/inference/openai/whisper-timestamped-medium.en'
copy

To see additional parameters and how to call this model, check out the documentation page for complete API reference and examples.

If you have any question, just reach out to us on our Discord server.

Related articles
Build a Streaming Chat Backend in 10 MinutesBuild a Streaming Chat Backend in 10 Minutes<p>When large language models move from demos into real systems, expectations change. The goal is no longer to produce clever text, but to deliver predictable latency, responsive behavior, and reliable infrastructure characteristics. In chat-based systems, especially, how fast a response starts often matters more than how fast it finishes. This is where token streaming becomes [&hellip;]</p>
NVIDIA Nemotron API Pricing Guide 2026NVIDIA Nemotron API Pricing Guide 2026<p>While everyone knows Llama 3 and Qwen, a quieter revolution has been happening in NVIDIA&#8217;s labs. They have been taking standard Llama models and &#8220;supercharging&#8221; them using advanced alignment techniques and pruning methods. The result is Nemotron—a family of models that frequently tops the &#8220;Helpfulness&#8221; leaderboards (like Arena Hard), often beating GPT-4o while being significantly [&hellip;]</p>
Search That Actually Works: A Guide to LLM RerankersSearch That Actually Works: A Guide to LLM RerankersSearch relevance isn’t a nice-to-have feature for your site or app. It can make or break the entire user experience. When a customer searches "best laptop for video editing" and gets results for gaming laptops or budget models, they leave empty-handed. Embeddings help you find similar content, bu...