We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Deploy Custom LLMs on DeepInfra
Published on 2024.03.01 by Iskren Chernev
Deploy Custom LLMs on DeepInfra

Did you just finetune your favorite model and are wondering where to run it? Well, we have you covered. Simple API and predictable pricing.

Put your model on huggingface

Use a private repo, if you wish, we don't mind. Create a hf access token just for the repo for better security.

Create custom deployment

Via Web

You can use the Web UI to create a new deployment.

Custom LLM Web UI

Via HTTP

We also offer HTTP API:

curl -X POST https://api.deepinfra.com/deploy/llm -d '{
    "model_name": "test-model",
    "gpu": "A100-80GB",
    "num_gpus": 2,
    "max_batch_size": 64,
    "hf": {
        "repo": "meta-llama/Llama-2-7b-chat-hf"
    },
    "settings": {
        "min_instances": 1,
        "max_instances": 1,
    }
}' -H 'Content-Type: application/json' \
    -H "Authorization: Bearer YOUR_API_KEY"
copy

Use it

curl -X POST \
    -d '{"input": "Hello"}' \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer YOUR_API_KEY" \
    'https://api.deepinfra.com/v1/inference/github-username/di-model-name'
copy

For in depth tutorial check Custom LLM Docs.

Related articles
Fork of Text Generation Inference.Fork of Text Generation Inference.The text generation inference open source project by huggingface looked like a promising framework for serving large language models (LLM). However, huggingface announced that they will change the license of code with version v1.0.0. While the previous license Apache 2.0 was permissive, the new on...
Long Context models incomingLong Context models incomingMany users requested longer context models to help them summarize bigger chunks of text or write novels with ease. We're proud to announce our long context model selection that will grow bigger in the comming weeks. Models Mistral-based models have a context size of 32k, and amazon recently r...
Compare Llama2 vs OpenAI models for FREE.Compare Llama2 vs OpenAI models for FREE.At DeepInfra we host the best open source LLM models. We are always working hard to make our APIs simple and easy to use. Today we are excited to announce a very easy way to quickly try our models like Llama2 70b and [Mistral 7b](/mistralai/Mistral-7B-Instruc...