NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing is 50% higher than base model.
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.
Use OpenAI API clients with LLaMasGetting started
# create a virtual environment
python3 -m venv .venv
# activate environment in current shell
. .venv/bin/activate
# install openai python client
pip install openai
Choose a model
meta-llama/Llama-2-70b-chat-hf
[meta-llama/L...
NVIDIA Nemotron API Pricing Guide 2026<p>While everyone knows Llama 3 and Qwen, a quieter revolution has been happening in NVIDIA’s labs. They have been taking standard Llama models and “supercharging” them using advanced alignment techniques and pruning methods. The result is Nemotron—a family of models that frequently tops the “Helpfulness” leaderboards (like Arena Hard), often beating GPT-4o while being significantly […]</p>
Reliable JSON-Only Responses with DeepInfra LLMs<p>When large language models are used inside real applications, their role changes fundamentally. Instead of chatting with users, they become infrastructure components: extracting information, transforming text, driving workflows, or powering APIs. In these scenarios, natural language is no longer the desired output. What applications need is structured data — and very often, that structure is […]</p>
© 2026 Deep Infra. All rights reserved.