nvidia/Llama-3.1-Nemotron-70B-Instruct cover image
featured

nvidia/Llama-3.1-Nemotron-70B-Instruct

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Public
$0.35/$0.40 in/out Mtoken
131,072
JSON
Function
ProjectPaperLicense
nvidia/Llama-3.1-Nemotron-70B-Instruct cover image

Llama-3.1-Nemotron-70B-Instruct

Ask me anything

0.00s

Model Overview

Description:

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo

As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

This model was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy.

Llama-3.1-Nemotron-70B-Instruct-HF has been converted from Llama-3.1-Nemotron-70B-Instruct to support it in the HuggingFace Transformers codebase. Please note that evaluation results might be slightly different from the Llama-3.1-Nemotron-70B-Instruct as evaluated in NeMo-Aligner, which the evaluation results below are based on.

Try hosted inference for free at build.nvidia.com - it comes with an OpenAI-compatible API interface.

See details on our paper at https://arxiv.org/abs/2410.01257 - as a preview, this model can correctly the question How many r in strawberry? without specialized prompting or additional reasoning tokens:

A sweet question!
Let’s count the “R”s in “strawberry”:
1. S
2. T
3. R
4. A
5. W
6. B
7. E
8. R
9. R
10. Y
There are **3 “R”s** in the word “strawberry”.

Note: This model is a demonstration of our techniques for improving helpfulness in general-domain instruction following. It has not been tuned for performance in specialized domains such as math.

Terms of use

By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the license, acceptable use policy and Meta’s privacy policy

Evaluation Metrics

As of 1 Oct 2024, Llama-3.1-Nemotron-70B-Instruct performs best on Arena Hard, AlpacaEval 2 LC (verified tab) and MT Bench (GPT-4-Turbo)

ModelArena HardAlpacaEvalMT-BenchMean Response Length
Details(95% CI)2 LC (SE)(GPT-4-Turbo)(# of Characters for MT-Bench)
Llama-3.1-Nemotron-70B-Instruct85.0 (-1.5, 1.5)57.6 (1.65)8.982199.8
Llama-3.1-70B-Instruct55.7 (-2.9, 2.7)38.1 (0.90)8.221728.6
Llama-3.1-405B-Instruct69.3 (-2.4, 2.2)39.3 (1.43)8.491664.7
Claude-3-5-Sonnet-2024062079.2 (-1.9, 1.7)52.4 (1.47)8.811619.9
GPT-4o-2024-05-1379.3 (-2.1, 2.0)57.5 (1.47)8.741752.2

References(s):

Model Architecture:

Architecture Type: Transformer <br> Network Architecture: Llama 3.1 <br>

Input:

Input Type(s): Text <br> Input Format: String <br> Input Parameters: One Dimensional (1D) <br> Other Properties Related to Input: Max of 128k tokens<br>

Output:

Output Type(s): Text <br> Output Format: String <br> Output Parameters: One Dimensional (1D) <br> Other Properties Related to Output: Max of 4k tokens <br>

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.