🚀 New models by Bria.ai, generate and edit images at scale 🚀
The Nemotron family is a group of large language models developed by NVIDIA, specifically engineered to excel at generating high-quality synthetic data for training other, more powerful AI models. Unlike models focused solely on end-user chat or content creation, Nemotron's core strength lies in producing diverse and realistic text-based training examples—including question-answer pairs, instructions, and conversations—that are crucial for the "supervised fine-tuning" stage of AI development. By providing a robust toolkit for creating these datasets, Nemotron acts as a powerful "force multiplier" in the AI training pipeline, enabling developers to build more capable and refined specialized models efficiently and at scale, without relying solely on scarce, human-curated data.
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.
Price per 1M input tokens
$0.60
Price per 1M output tokens
$0.60
Release Date
10/16/2024
Context Size
131,072
Quantization
fp8
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="nvidia/Llama-3.1-Nemotron-70B-Instruct",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
NVIDIA Nemotron is a family of open models customized for efficiency, accuracy, and specialized workloads.
Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
---|---|---|---|---|
Llama-3.1-Nemotron-70B-Instruct | 128k | $0.60 | $0.60 | |
Llama-3.3-Nemotron-Super-49B-v1.5 | 128k | $0.10 | $0.40 | |
NVIDIA-Nemotron-Nano-9B-v2 | 128k | $0.04 | $0.16 |
© 2025 Deep Infra. All rights reserved.