FLUX.2 is live! High-fidelity image generation made simple.

# create a virtual environment
python3 -m venv .venv
# activate environment in current shell
. .venv/bin/activate
# install openai python client
pip install openai
import openai
stream = True # or False
# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA API KEY>"
openai.api_base = "https://api.deepinfra.com/v1/openai"
# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
model=MODEL_DI,
messages=[{"role": "user", "content": "Hello world"}],
stream=stream,
max_tokens=100,
# top_p=0.5,
)
if stream:
# print the chat completion
for event in chat_completion:
print(event.choices)
else:
print(chat_completion.choices[0].message.content)
Note that both streaming and batch mode are supported.
If you're already using OpenAI chat completion in your project, you need to
change the api_key, api_base and model params:
import openai
# set these before running any completions
openai.api_key = "YOUR DEEPINFRA TOKEN"
openai.api_base = "https://api.deepinfra.com/v1/openai"
openai.ChatCompletion.create(
model="CHOSEN MODEL HERE",
# ...
)
Our OpenAI API compatible models are priced on token output (just like OpenAI). Our current price is $1 / 1M tokens.
Check the docs for more in-depth information and examples openai api.
Model Distillation Making AI Models EfficientAI Model Distillation Definition & Methodology
Model distillation is the art of teaching a smaller, simpler model to perform as well as a larger one. It's like training an apprentice to take over a master's work—streamlining operations with comparable performance . If you're struggling with depl...
Langchain improvements: async and streamingStarting from langchain
v0.0.322 you
can make efficient async generation and streaming tokens with deepinfra.
Async generation
The deepinfra wrapper now supports native async calls, so you can expect more
performance (no more t...
Chat with books using DeepInfra and LlamaIndexAs DeepInfra, we are excited to announce our integration with LlamaIndex.
LlamaIndex is a powerful library that allows you to index and search documents
using various language models and embeddings. In this blog post, we will show
you how to chat with books using DeepInfra and LlamaIndex.
We will ...© 2026 Deep Infra. All rights reserved.