FLUX.2 is live! High-fidelity image generation made simple.

Did you just finetune your favorite model and are wondering where to run it? Well, we have you covered. Simple API and predictable pricing.
Use a private repo, if you wish, we don't mind. Create a hf access token just for the repo for better security.
You can use the Web UI to create a new deployment.
We also offer HTTP API:
curl -X POST https://api.deepinfra.com/deploy/llm -d '{
"model_name": "test-model",
"gpu": "A100-80GB",
"num_gpus": 2,
"max_batch_size": 64,
"hf": {
"repo": "meta-llama/Llama-2-7b-chat-hf"
},
"settings": {
"min_instances": 1,
"max_instances": 1,
}
}' -H 'Content-Type: application/json' \
-H "Authorization: Bearer YOUR_API_KEY"
curl -X POST \
-d '{"input": "Hello"}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer YOUR_API_KEY" \
'https://api.deepinfra.com/v1/inference/github-username/di-model-name'
For in depth tutorial check Custom LLM Docs.
Art That Talks Back: A Hands-On Tutorial on Talking ImagesTurn any image into a talking masterpiece with this step-by-step guide using DeepInfra’s GenAI models.
FLUX.1-dev Guide: Mastering Text-to-Image AI Prompts for Stunning and Consistent VisualsLearn how to craft compelling prompts for FLUX.1-dev to create stunning images.
Model Distillation Making AI Models EfficientAI Model Distillation Definition & Methodology
Model distillation is the art of teaching a smaller, simpler model to perform as well as a larger one. It's like training an apprentice to take over a master's work—streamlining operations with comparable performance . If you're struggling with depl...© 2026 Deep Infra. All rights reserved.