We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

Compare Llama2 vs OpenAI models for FREE.
Published on 2023.09.28 by Nikola Borisov
Compare Llama2 vs OpenAI models for FREE.

At DeepInfra we host the best open source LLM models. We are always working hard to make our APIs simple and easy to use.

Today we are excited to announce a very easy way to quickly try our models like Llama2 70b and Mistral 7b and compare them to OpenAI's models. You only need to change the API endpoint URL and the model name to quickly see if these models are a good fit for your application.

Here is a quick example of how to use the OpenAI Python client with our models:

import openai

# Point OpenAI client to our endpoint
openai.api_base = "https://api.deepinfra.com/v1/openai"
# Just leave the API key empty. You don't need it to try our models. 
openai.api_key = ""

# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    messages=[{"role": "user", "content": "Hello world"}],
    stream=True,
)

# print the chat completion
for event in chat_completion:
    print(event.choices)
copy

Rate limits on no API key

To make it as simple as possible you don't even have to create an account with DeepInfra to try our models. Just pass empty string as api_key and you are good to go. We rate limit the unauthenticated requests by IP address.

Pricing and Production ready

When you are ready to use our models in production, you can create an account at DeepInfra and get an API key. We offer the best pricing for the llama 2 70b model at just $1 per 1M tokens. If you need any help, just reach out to us on our Discord server.

Related articles
Qwen3.5 35B A3B API Benchmarks: Latency, Throughput & CostQwen3.5 35B A3B API Benchmarks: Latency, Throughput & Cost<p>About Qwen3.5 35B A3B Qwen3.5 35B A3B is a native vision-language model released by Alibaba Cloud in February 2026. It uses a hybrid architecture that integrates Gated Delta Networks with a sparse Mixture-of-Experts model, achieving higher inference efficiency. With 35 billion total parameters and only 3 billion activated per token through 256 experts (8 routed [&hellip;]</p>
Nemotron 3 Nano vs GPT-OSS-20B: Performance, Benchmarks & DeepInfra ResultsNemotron 3 Nano vs GPT-OSS-20B: Performance, Benchmarks & DeepInfra Results<p>The open-source LLM landscape is becoming increasingly diverse, with models optimized for reasoning, throughput, cost-efficiency, and real-world agentic applications. Two models that stand out in this new generation are NVIDIA’s Nemotron 3 Nano and OpenAI’s GPT-OSS-20B, both of which offer strong performance while remaining openly available and deployable across cloud and edge systems. Although both [&hellip;]</p>
Kimi K2 0905 API Benchmarks: Latency, Throughput & CostKimi K2 0905 API Benchmarks: Latency, Throughput & Cost<p>About Kimi K2 0905 Kimi K2 0905 is a state-of-the-art large language model developed by Moonshot AI, representing a significant advancement in open-weight AI capabilities. This Mixture-of-Experts (MoE) model features 1 trillion total parameters with 32 billion activated parameters per forward pass, making it highly efficient while maintaining frontier-level performance. The model supports a 256k [&hellip;]</p>