Documentation
Contents
We offer OpenAI compatible API for all recent LLM models and all Embeddings Models.
The APIs we support are:
The api_base
is https://api.deepinfra.com/v1/openai
.
pip install 'openai>=1.0.0'
from openai import OpenAI
client = OpenAI(
api_key="<YOUR DEEPINFRA TOKEN: deepctl auth token or get one from https://deepinfra.com/dash/api_keys > ",
base_url="https://api.deepinfra.com/v1/openai")
stream = True # or False
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = client.chat.completions.create(model=MODEL_DI,
messages=[{"role": "user", "content": "Hello world"}],
stream=stream,
max_tokens=100)
if stream:
# print the chat completion
for event in chat_completion:
print(event.choices)
else:
print(chat_completion.choices[0].message.content)
You can of course use regular HTTP:
export TOKEN="$(deepctl auth token)"
export URL_DI="https://api.deepinfra.com/v1/openai/chat/completions"
export MODEL_DI="meta-llama/Llama-2-70b-chat-hf"
curl "$URL_DI" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"stream": true,
"model": "'$MODEL_DI'",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"max_tokens": 100
}'
If you're already using OpenAI's chat completion endpoint you can just set the
base_url
, the api token and change the model name, and you're good to go.
pip install 'openai<1.0.0'
import openai
stream = True # or False
# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA TOKEN: deepctl auth token>"
openai.api_base = "https://api.deepinfra.com/v1/openai"
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
model=MODEL_DI,
messages=[{"role": "user", "content": "Hello world"}],
stream=stream,
max_tokens=100,
# top_p=0.5,
)
if stream:
# print the chat completion
for event in chat_completion:
print(event.choices)
else:
print(chat_completion.choices[0].message.content)
Some models have more than one version available, you can infer against
a particular version by specifying {"model": "MODEL_NAME:VERSION", ...}
format.
You could also infer against a deploy_id
, by using {"model": "deploy_id:DEPLOY_ID", ...}
. This is especially useful for Custom
LLMs, you can infer before the deployment is
running (and before you have the model-name+version pair).
Please note that we're not yet 100% compatible, drop us a line in discord if you'd like us to prioritize something missing. Supported request attributes:
ChatCompletions and Completions:
model
, including specifying version
/deploy_id
supportmessages
(roles system
, user
, assistant
)max_tokens
stream
temperature
top_p
stop
n
presence_penalty
frequency_penalty
response_format
({"type": "json"}
only)tools
, tool_choice
echo
, logprobs
-- only for (non chat) completionsEmbeddings:
model
input
encoding_format
-- float
only