We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Deploy Custom LLMs on DeepInfra

Published on 2024.03.01 by Iskren Chernev

Deploy Custom LLMs on DeepInfra header picture

Did you just finetune your favorite model and are wondering where to run it? Well, we have you covered. Simple API and predictable pricing.

Put your model on huggingface

Use a private repo, if you wish, we don't mind. Create a hf access token just for the repo for better security.

Create custom deployment

Via Web

You can use the Web UI to create a new deployment.

Custom LLM Web UI

Via HTTP

We also offer HTTP API:

curl -X POST https://api.deepinfra.com/deploy/llm -d '{
    "model_name": "test-model",
    "gpu": "A100-80GB",
    "num_gpus": 2,
    "max_batch_size": 64,
    "hf": {
        "repo": "meta-llama/Llama-2-7b-chat-hf"
    },
    "settings": {
        "min_instances": 1,
        "max_instances": 1,
    }
}' -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $(deepctl auth token)"
copy

Use it

deepctl infer -m github-username/di-model-name -i input="Hello"
copy

For in depth tutorial check Custom LLM Docs.