We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

DeepInfra is now a supported Hugging Face Inference Provider
Published on 2026.04.29 by Aray Sultanbekova
DeepInfra is now a supported Hugging Face Inference Provider

DeepInfra is now a supported Hugging Face Inference Provider

DeepInfra is officially live as an Inference Provider on the Hugging Face Hub. You can now call DeepInfra-hosted models directly from Hugging Face model pages, through our OpenAI-compatible router (use it with any OpenAI SDK), or via the Hugging Face SDKs in Python and JavaScript.

What's new

Hugging Face's Inference Providers system lets developers run inference against partner platforms without leaving the Hub. As of today, DeepInfra is one of those partners.

At launch, we support chat completion and text generation tasks. That covers most open-weight LLMs people deploy in production — DeepSeek V4, Kimi-K2.6, GLM-5.1, Llama, Qwen, Mistral, and many more. Support for our other model categories (text-to-image, text-to-video, embeddings, speech) will roll out next.

You can browse every DeepInfra-supported model here: 👉 huggingface.co/models?inference_provider=deepinfra

How to use it

You have two ways to authenticate, and both work with the same code.

Option 1 — Use your DeepInfra API key. Add it to your Hugging Face provider settings. Requests go directly to DeepInfra and are billed to your DeepInfra account at standard rates.

Option 2 — Use your Hugging Face token. Hugging Face will route your request to DeepInfra and bill it to your HF account. PRO users get $2 of inference credits each month; free users get a small monthly quota.

Python

from huggingface_hub import InferenceClient

client = InferenceClient()

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[
        {"role": "user", "content": "Write a Fibonacci function with memoization."}
    ],
)

print(completion.choices[0].message)
copy

JavaScript

import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const completion = await client.chatCompletion({
  model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(completion.choices[0].message);
copy

Using the OpenAI SDK

The Hugging Face router is OpenAI-compatible, so existing OpenAI code works with one line changed — point base_url at the HF router:

from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[{"role": "user", "content": "Hello!"}],
)
copy

The only thing that changes is the :deepinfra suffix on the model id.

What this means for our users

If you already use DeepInfra, nothing changes — your existing API and account work exactly as they always have. What's new is reach.

  • Discoverability. Every Hugging Face model page that runs on DeepInfra now shows us as a supported provider, with one-click code snippets in Python, JavaScript, and cURL.
  • Same pricing, no markup. Hugging Face passes through DeepInfra's per-token rates without any added fees. You pay the same whether you call us directly or via the HF router.
  • Drop-in for HF-based workflows. If your team already uses Hugging Face for model search, evaluation, or agent tooling (Pi, OpenCode, Hermes Agents, VS Code with Copilot, and more), DeepInfra is now a one-line provider swap.
  • Try before you buy. Use the Inference Playground to test any DeepInfra-supported model in the browser before wiring it into your stack.
Related articles
DeepSeek V4 Pro Is Now Available on DeepInfraDeepSeek V4 Pro Is Now Available on DeepInfra<p>DeepSeek released V4 Pro on April 24, 2026 — a 1.6 trillion-parameter Mixture of Experts model with 49 billion active parameters, a 1-million-token context window, and weights available on Hugging Face under an MIT license. On LiveCodeBench, the V4-Pro-Max reasoning variant scores 93.5 Pass@1, leading every model in the comparison set, including Gemini-3.1-Pro High at [&hellip;]</p>
LLM API Provider Performance KPIs 101: TTFT, Throughput & End-to-End GoalsLLM API Provider Performance KPIs 101: TTFT, Throughput & End-to-End Goals<p>Fast, predictable responses turn a clever demo into a dependable product. If you’re building on an LLM API provider like DeepInfra, three performance ideas will carry you surprisingly far: time-to-first-token (TTFT), throughput, and an explicit end-to-end (E2E) goal that blends speed, reliability, and cost into something users actually feel. This beginner-friendly guide explains each KPI [&hellip;]</p>
Qwen API Pricing Guide 2026: Max Performance on a BudgetQwen API Pricing Guide 2026: Max Performance on a Budget<p>If you have been following the AI leaderboards lately, you have likely noticed a new name constantly trading blows with GPT-4o and Claude 3.5 Sonnet: Qwen. Developed by Alibaba Cloud, the Qwen model family (specifically Qwen 2.5 and Qwen 3) has exploded in popularity for one simple reason: unbeatable price-to-performance. In 2025, Qwen is widely [&hellip;]</p>