DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

At DeepInfra we host the best open source LLM models. We are always working hard to make our APIs simple and easy to use.
Today we are excited to announce a very easy way to quickly try our models like Llama2 70b and Mistral 7b and compare them to OpenAI's models. You only need to change the API endpoint URL and the model name to quickly see if these models are a good fit for your application.
Here is a quick example of how to use the OpenAI Python client with our models:
import openai
# Point OpenAI client to our endpoint
openai.api_base = "https://api.deepinfra.com/v1/openai"
# Just leave the API key empty. You don't need it to try our models.
openai.api_key = ""
# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
model="meta-llama/Llama-2-70b-chat-hf",
messages=[{"role": "user", "content": "Hello world"}],
stream=True,
)
# print the chat completion
for event in chat_completion:
print(event.choices)
To make it as simple as possible you don't even have to create an account with DeepInfra to
try our models. Just pass empty string as api_key and you are good to go. We rate limit the
unauthenticated requests by IP address.
When you are ready to use our models in production, you can create an account at DeepInfra and get an API key. We offer the best pricing for the llama 2 70b model at just $1 per 1M tokens. If you need any help, just reach out to us on our Discord server.
Gemma 4 Pricing, Benchmarks & Real-World Cost Analysis<p>Gemma 4 puts a serious open-weight reasoning model into a genuinely competitive provider market. The same Gemma 4 26B A4B model is available across seven API providers, with blended pricing ranging from $0.10 to $0.70 per 1M tokens — real variation that changes production economics. Released April 3, 2026 by Google DeepMind under Apache 2.0, […]</p>
DeepSeek V4 Pro Is Now Available on DeepInfra<p>DeepSeek released V4 Pro on April 24, 2026 — a 1.6 trillion-parameter Mixture of Experts model with 49 billion active parameters, a 1-million-token context window, and weights available on Hugging Face under an MIT license. On LiveCodeBench, the V4-Pro-Max reasoning variant scores 93.5 Pass@1, leading every model in the comparison set, including Gemini-3.1-Pro High at […]</p>
Kimi K2 0905 API from Deepinfra: Practical Speed, Predictable Costs, Built for Devs - Deep Infra<p>Kimi K2 0905 is Moonshot’s long-context Mixture-of-Experts update designed for agentic and coding workflows. With a context window up to ~256K tokens, it can ingest large codebases, multi-file documents, or long conversations and still deliver structured, high-quality outputs. But real-world performance isn’t defined by the model alone—it’s determined by the inference provider that serves it: […]</p>
© 2026 DeepInfra. All rights reserved.