FLUX.2 is live! High-fidelity image generation made simple.

To use DeepInfra's API, you'll need an API key.
You'll use this API key in your requests to authenticate with our services.
Whisper is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English, see more at deepinfra.com. By default, Whisper produces by sentence timestamp segmentation. We also host whisper-timestamped that can provide timestamps for words in the audio. You can use it with our REST API. Here's how to use it:
curl -X POST \
-F "audio=@/home/user/all-in-01.mp3" \
-H "Authorization: Bearer YOUR_API_KEY" \
'https://api.deepinfra.com/v1/inference/openai/whisper-timestamped-medium.en'
To see additional parameters and how to call this model, check out the documentation page for complete API reference and examples.
If you have any question, just reach out to us on our Discord server.
GLM-4.6 vs DeepSeek-V3.2: Performance, Benchmarks & DeepInfra Results<p>The open-source LLM ecosystem has evolved rapidly, and two models stand out as leaders in capability, efficiency, and practical usability: GLM-4.6, Zhipu AI’s high-capacity reasoning model with a 200k-token context window, and DeepSeek-V3.2, a sparsely activated Mixture-of-Experts architecture engineered for exceptional performance per dollar. Both models are powerful. Both are versatile. Both are widely adopted […]</p>
Deep Infra Launches Access to NVIDIA Nemotron Models for Vision, Retrieval, and AI SafetyDeep Infra is serving the new, open NVIDIA Nemotron vision language and OCR AI models from day zero of their release. As a leading inference provider committed to performance and cost-efficiency, we're making these cutting-edge models available at the industry's best prices, empowering developers to build specialized AI agents without compromising on budget or performance.© 2026 Deep Infra. All rights reserved.