NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

Many users requested longer context models to help them summarize bigger chunks of text or write novels with ease.
We're proud to announce our long context model selection that will grow bigger in the comming weeks.
Mistral-based models have a context size of 32k, and amazon recently released a model fine-tuned specifically on longer contexts.
We also recently released the highly praised Yi models. Keep in mind they don't support chat, just the old-school text completion (new models are in the works):
GLM-4.6 API: Get fast first tokens at the best $/M from Deepinfra's API - Deep Infra<p>GLM-4.6 is a high-capacity, “reasoning”-tuned model that shows up in coding copilots, long-context RAG, and multi-tool agent loops. With this class of workload, provider infrastructure determines perceived speed (first-token time), tail stability, and your unit economics. Using ArtificialAnalysis (AA) provider charts for GLM-4.6 (Reasoning), DeepInfra (FP8) pairs a sub-second Time-to-First-Token (TTFT) (0.51 s) with the […]</p>
Building a Voice Assistant with Whisper, LLM, and TTSLearn how to create a voice assistant using Whisper for speech recognition, LLM for conversation, and TTS for text-to-speech.
Enhancing Open-Source LLMs with Function Calling FeatureWe're excited to announce that the Function Calling feature is now available on DeepInfra. We're offering Mistral-7B and Mixtral-8x7B models with this feature. Other models will be available soon.
LLM models are powerful tools for various tasks. However, they're limited in their ability to per...© 2026 Deep Infra. All rights reserved.