We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Use OpenAI API clients with LLaMas
Published on 2023.08.28 by Iskren Chernev
Use OpenAI API clients with LLaMas

Getting started

# create a virtual environment
python3 -m venv .venv
# activate environment in current shell
. .venv/bin/activate
# install openai python client
pip install openai
copy

Choose a model

Run OpenAI chat.completion

import openai

stream = True # or False

# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA API KEY>"
openai.api_base = "https://api.deepinfra.com/v1/openai"

# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
    model=MODEL_DI,
    messages=[{"role": "user", "content": "Hello world"}],
    stream=stream,
    max_tokens=100,
    # top_p=0.5,
)

if stream:
    # print the chat completion
    for event in chat_completion:
        print(event.choices)
else:
    print(chat_completion.choices[0].message.content)
copy

Note that both streaming and batch mode are supported.

Existing OpenAI integration

If you're already using OpenAI chat completion in your project, you need to change the api_key, api_base and model params:

import openai

# set these before running any completions
openai.api_key = "YOUR DEEPINFRA TOKEN"
openai.api_base = "https://api.deepinfra.com/v1/openai"

openai.ChatCompletion.create(
    model="CHOSEN MODEL HERE",
    # ...
)
copy

Pricing

Our OpenAI API compatible models are priced on token output (just like OpenAI). Our current price is $1 / 1M tokens.

Docs

Check the docs for more in-depth information and examples openai api.

Related articles
Building a Voice Assistant with Whisper, LLM, and TTSBuilding a Voice Assistant with Whisper, LLM, and TTSLearn how to create a voice assistant using Whisper for speech recognition, LLM for conversation, and TTS for text-to-speech.
Search That Actually Works: A Guide to LLM RerankersSearch That Actually Works: A Guide to LLM RerankersSearch relevance isn’t a nice-to-have feature for your site or app. It can make or break the entire user experience. When a customer searches "best laptop for video editing" and gets results for gaming laptops or budget models, they leave empty-handed. Embeddings help you find similar content, bu...
Seed Anchoring and Parameter Tweaking with SDXL Turbo: Create Stunning Cubist ArtSeed Anchoring and Parameter Tweaking with SDXL Turbo: Create Stunning Cubist ArtIn this blog post, we're going to explore how to create stunning cubist art using SDXL Turbo using some advanced image generation techniques.