We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Nemotron Model Family

The Nemotron family is a group of large language models developed by NVIDIA, specifically engineered to excel at generating high-quality synthetic data for training other, more powerful AI models. Unlike models focused solely on end-user chat or content creation, Nemotron's core strength lies in producing diverse and realistic text-based training examples—including question-answer pairs, instructions, and conversations—that are crucial for the "supervised fine-tuning" stage of AI development. By providing a robust toolkit for creating these datasets, Nemotron acts as a powerful "force multiplier" in the AI training pipeline, enabling developers to build more capable and refined specialized models efficiently and at scale, without relying solely on scarce, human-curated data.

Featured Model: nvidia/Llama-3.1-Nemotron-70B-Instruct

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Price per 1M input tokens

$0.60


Price per 1M output tokens

$0.60


Release Date

10/16/2024


Context Size

131,072


Quantization

fp8


License Type


# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="nvidia/Llama-3.1-Nemotron-70B-Instruct",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available Nemotron Models

NVIDIA Nemotron is a family of open models customized for efficiency, accuracy, and specialized workloads.

ModelContext$ per 1M input tokens$ per 1M output tokens
Actions
Llama-3.1-Nemotron-70B-Instruct128k$0.60$0.60
Llama-3.3-Nemotron-Super-49B-v1.5128k$0.10$0.40
NVIDIA-Nemotron-Nano-9B-v2128k$0.04$0.16

FAQ

How do I integrate Nemotron models into my application?

You can integrate Nemotron models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.

What are the pricing details for using Nemotron models on DeepInfra?

Pricing is usage-based:
  • Input Tokens: between $0.04 and $0.60 per million
  • Output Tokens: between $0.16 and $0.60 per million
Prices vary slightly by model. There are no upfront fees, and you only pay for what you use.

How do I get started using Nemotron on DeepInfra?

Sign in with GitHub at deepinfra.com
  • Get your API key
  • Test models directly from the browser, cURL, or SDKs
  • Review pricing on your usage dashboard
Within minutes, you can deploy apps using Nemotron models—without any infrastructure setup.