FLUX.2 is live! High-fidelity image generation made simple.

At DeepInfra we host the best open source LLM models. We are always working hard to make our APIs simple and easy to use.
Today we are excited to announce a very easy way to quickly try our models like Llama2 70b and Mistral 7b and compare them to OpenAI's models. You only need to change the API endpoint URL and the model name to quickly see if these models are a good fit for your application.
Here is a quick example of how to use the OpenAI Python client with our models:
import openai
# Point OpenAI client to our endpoint
openai.api_base = "https://api.deepinfra.com/v1/openai"
# Just leave the API key empty. You don't need it to try our models.
openai.api_key = ""
# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
model="meta-llama/Llama-2-70b-chat-hf",
messages=[{"role": "user", "content": "Hello world"}],
stream=True,
)
# print the chat completion
for event in chat_completion:
print(event.choices)
To make it as simple as possible you don't even have to create an account with DeepInfra to
try our models. Just pass empty string as api_key and you are good to go. We rate limit the
unauthenticated requests by IP address.
When you are ready to use our models in production, you can create an account at DeepInfra and get an API key. We offer the best pricing for the llama 2 70b model at just $1 per 1M tokens. If you need any help, just reach out to us on our Discord server.
Deploy Custom LLMs on DeepInfraDid you just finetune your favorite model and are wondering where to run it?
Well, we have you covered. Simple API and predictable pricing.
Put your model on huggingface
Use a private repo, if you wish, we don't mind. Create a hf access token just
for the repo for better security.
Create c...
Nemotron 3 Nano Explained: NVIDIA’s Efficient Small LLM and Why It Matters<p>The open-source LLM space has exploded with models competing across size, efficiency, and reasoning capability. But while frontier models dominate headlines with enormous parameter counts, a different category has quietly become essential for real-world deployment: small yet high-performance models optimized for edge devices, private on-prem systems, and cost-sensitive applications. NVIDIA’s Nemotron family brings together open […]</p>
Qwen API Pricing Guide 2026: Max Performance on a Budget<p>If you have been following the AI leaderboards lately, you have likely noticed a new name constantly trading blows with GPT-4o and Claude 3.5 Sonnet: Qwen. Developed by Alibaba Cloud, the Qwen model family (specifically Qwen 2.5 and Qwen 3) has exploded in popularity for one simple reason: unbeatable price-to-performance. In 2025, Qwen is widely […]</p>
© 2026 Deep Infra. All rights reserved.