Documentation

OpenAI API

We offer OpenAI compatible API for all recent LLM models and all Embeddings Models.

The APIs we support are:

chat completion, both streaming and regular, supported for all chat-tuned LLMs
completion, both streaming and regular, supported for all LLMs (chat tuned or not)
embeddings -- supported for all embeddings models.

The api_base is https://api.deepinfra.com/v1/openai.

Example with recent python client

pip install 'openai>=1.0.0'

from openai import OpenAI

client = OpenAI(
        api_key="<YOUR DEEPINFRA TOKEN: deepctl auth token or get one from https://deepinfra.com/dash/api_keys > ",
        base_url="https://api.deepinfra.com/v1/openai")

stream = True # or False

MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = client.chat.completions.create(model=MODEL_DI,
    messages=[{"role": "user", "content": "Hello world"}],
    stream=stream,
    max_tokens=100)

if stream:
    # print the chat completion
    for event in chat_completion:
        print(event.choices)
else:
    print(chat_completion.choices[0].message.content)

You can of course use regular HTTP:

export TOKEN="$(deepctl auth token)"
export URL_DI="https://api.deepinfra.com/v1/openai/chat/completions"
export MODEL_DI="meta-llama/Llama-2-70b-chat-hf"

curl "$URL_DI" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
      "stream": true,
      "model": "'$MODEL_DI'",
      "messages": [
        {
          "role": "user",
          "content": "Hello!"
        }
      ],
      "max_tokens": 100
    }'

If you're already using OpenAI's chat completion endpoint you can just set the base_url, the api token and change the model name, and you're good to go.

Example with legacy python client

pip install 'openai<1.0.0'

import openai

stream = True # or False

# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA TOKEN: deepctl auth token>"
openai.api_base = "https://api.deepinfra.com/v1/openai"

MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
    model=MODEL_DI,
    messages=[{"role": "user", "content": "Hello world"}],
    stream=stream,
    max_tokens=100,
    # top_p=0.5,
)

if stream:
    # print the chat completion
    for event in chat_completion:
        print(event.choices)
else:
    print(chat_completion.choices[0].message.content)

Model parameter

Some models have more than one version available, you can infer against a particular version by specifying {"model": "MODEL_NAME:VERSION", ...} format.

You could also infer against a deploy_id, by using {"model": "deploy_id:DEPLOY_ID", ...}. This is especially useful for Custom LLMs, you can infer before the deployment is running (and before you have the model-name+version pair).

Caveats

Please note that we're not yet 100% compatible, drop us a line in discord if you'd like us to prioritize something missing. Supported request attributes:

ChatCompletions and Completions:

model, including specifying version/deploy_id support
messages (roles system, user, assistant)
max_tokens
stream
temperature
top_p
stop
n
presence_penalty
frequency_penalty
response_format ({"type": "json"} only)
tools, tool_choice
echo, logprobs -- only for (non chat) completions

Embeddings:

model
input
encoding_format -- float only

Custom LLMs Log Probabilities

Latest Models

Phind/

Phind-CodeLlama-34B-v2

bigcode/

starcoder2-15b

openai/

whisper-tiny

openchat/

openchat_3.5

Gryphe/

MythoMax-L2-13b

Featured Models

meta-llama/

Llama-2-70b-chat-hf

openchat/

openchat_3.5

deepinfra/

airoboros-70b

microsoft/

WizardLM-2-8x22B

meta-llama/

Meta-Llama-3-70B-Instruct

llava-hf/

llava-1.5-7b-hf

Company

Pricing

Docs

Compare

DeepStart

About

Privacy

Terms