DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

DeepInfra is officially live as an Inference Provider on the Hugging Face Hub. You can now call DeepInfra-hosted models directly from Hugging Face model pages, through our OpenAI-compatible router (use it with any OpenAI SDK), or via the Hugging Face SDKs in Python and JavaScript.
Hugging Face's Inference Providers system lets developers run inference against partner platforms without leaving the Hub. As of today, DeepInfra is one of those partners.
At launch, we support chat completion and text generation tasks. That covers most open-weight LLMs people deploy in production — DeepSeek V4, Kimi-K2.6, GLM-5.1, Llama, Qwen, Mistral, and many more. Support for our other model categories (text-to-image, text-to-video, embeddings, speech) will roll out next.
You can browse every DeepInfra-supported model here: 👉 huggingface.co/models?inference_provider=deepinfra
You have two ways to authenticate, and both work with the same code.
Option 1 — Use your DeepInfra API key. Add it to your Hugging Face provider settings. Requests go directly to DeepInfra and are billed to your DeepInfra account at standard rates.
Option 2 — Use your Hugging Face token. Hugging Face will route your request to DeepInfra and bill it to your HF account. PRO users get $2 of inference credits each month; free users get a small monthly quota.
from huggingface_hub import InferenceClient
client = InferenceClient()
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages=[
{"role": "user", "content": "Write a Fibonacci function with memoization."}
],
)
print(completion.choices[0].message)
import { InferenceClient } from "@huggingface/inference";
const client = new InferenceClient(process.env.HF_TOKEN);
const completion = await client.chatCompletion({
model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message);
The Hugging Face router is OpenAI-compatible, so existing OpenAI code works with one line changed — point base_url at the HF router:
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
messages=[{"role": "user", "content": "Hello!"}],
)
The only thing that changes is the :deepinfra suffix on the model id.
If you already use DeepInfra, nothing changes — your existing API and account work exactly as they always have. What's new is reach.
Nemotron 3 Nano vs GPT-OSS-20B: Performance, Benchmarks & DeepInfra Results<p>The open-source LLM landscape is becoming increasingly diverse, with models optimized for reasoning, throughput, cost-efficiency, and real-world agentic applications. Two models that stand out in this new generation are NVIDIA’s Nemotron 3 Nano and OpenAI’s GPT-OSS-20B, both of which offer strong performance while remaining openly available and deployable across cloud and edge systems. Although both […]</p>
Accelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Nano, the newest open reasoning model in the Nemotron family. Our goal is to give developers, researchers, and teams the fastest and simplest path to using Nemotron 3 Nano from day one.
Introducing Tool Calling with LangChain, Search the Web with Tavily and Tool Calling AgentsIn this blog post, we will query for the details of a recently released expansion pack for Elden Ring, a critically acclaimed game released in 2022, using the Tavily tool with the ChatDeepInfra model.
Using this boilerplate, one can automate the process of searching for information with well-writt...© 2026 DeepInfra. All rights reserved.