DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Did you just finetune your favorite model and are wondering where to run it? Well, we have you covered. Simple API and predictable pricing.
Use a private repo, if you wish, we don't mind. Create a hf access token just for the repo for better security.
You can use the Web UI to create a new deployment.
We also offer HTTP API:
curl -X POST https://api.deepinfra.com/deploy/llm -d '{
"model_name": "test-model",
"gpu": "A100-80GB",
"num_gpus": 2,
"max_batch_size": 64,
"hf": {
"repo": "meta-llama/Llama-2-7b-chat-hf"
},
"settings": {
"min_instances": 1,
"max_instances": 1,
}
}' -H 'Content-Type: application/json' \
-H "Authorization: Bearer YOUR_API_KEY"
curl -X POST \
-d '{"input": "Hello"}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer YOUR_API_KEY" \
'https://api.deepinfra.com/v1/inference/github-username/di-model-name'
For in depth tutorial check Custom LLM Docs.
Qwen3.5 4B via DeepInfra: Latency, Throughput & Cost<p>About Qwen3.5 4B (Reasoning) Qwen3.5 4B is a compact 4-billion parameter open-weights model released in March 2026 as part of Alibaba Cloud’s Qwen3.5 Small Model Series. It employs an Efficient Hybrid Architecture combining Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts, delivering high-throughput inference with minimal latency overhead — a significant architectural […]</p>
Power the Next Era of Image Generation with FLUX.2 Visual Intelligence on DeepInfraDeepInfra is excited to support FLUX.2 from day zero, bringing the newest visual intelligence model from Black Forest Labs to our platform at launch. We make it straightforward for developers, creators, and enterprises to run the model with high performance, transparent pricing, and an API designed for productivity.
The easiest way to build AI applications with Llama 2 LLMs.The long awaited Llama 2 models are finally here!
We are excited to show you how to use them with DeepInfra. These collection of models represent
the state of the art in open source language models.
They are made available by Meta AI and the l...© 2026 DeepInfra. All rights reserved.