DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Starting from langchain v0.0.322 you can make efficient async generation and streaming tokens with deepinfra.
The deepinfra wrapper now supports native async calls, so you can expect more performance (no more threads per invocation) from your async pipelines.
from langchain.llms.deepinfra import DeepInfra
async def async_predict():
llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
output = await llm.apredict("What is 2 + 2?")
print(output)
Streaming lets you receive each token of the response as it gets generated. This is indispensable in user-facing applications.
def streaming():
llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
for chunk in llm.stream("[INST] Hello [/INST] "):
print(chunk, end='', flush=True)
print()
You can also use the asynchronous streaming API, natively implemented underneath.
async def async_streaming():
llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
async for chunk in llm.astream("[INST] Hello [/INST] "):
print(chunk, end='', flush=True)
print()
Accelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Nano, the newest open reasoning model in the Nemotron family. Our goal is to give developers, researchers, and teams the fastest and simplest path to using Nemotron 3 Nano from day one.
Compare Llama2 vs OpenAI models for FREE.At DeepInfra we host the best open source LLM models. We are always working hard to make
our APIs simple and easy to use.
Today we are excited to announce a very easy way to quickly try our models like
Llama2 70b and
[Mistral 7b](/mistralai/Mistral-7B-Instruc...
Open-Source vs Closed-Source AI Models: Is the Gap Worth It?<p>The Artificial Analysis Intelligence Index sits at a ceiling of 57. Three frontier models — Claude Opus 4.7, Gemini 3.1 Pro Preview, and GPT-5.5 — all land in that band. Meanwhile, four open-weight models released between February and April 2026 now score 50 or above on the same index. A year ago, the best open-weight […]</p>
© 2026 DeepInfra. All rights reserved.