DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing is 50% higher than base model.
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.
Best Models for OpenClaw: Top Picks for Agentic Workloads<p>When you configure OpenClaw for the first time, the model picker looks like a minor config detail. It isn’t. The model you connect decides whether your agents complete tasks reliably or fall apart halfway through a multi-step workflow. It sets what you pay per completed job, not just per token. And it determines whether your […]</p>
Function Calling in DeepInfra: Extend Your AI with Real-World Logic<p>Modern large language models (LLMs) are incredibly powerful at understanding and generating text, but until recently they were largely static: they could only respond based on patterns in their training data. Function calling changes that. It lets language models interact with external logic — your own code, APIs, utilities, or business systems — while still […]</p>
DeepInfra Raises $107M Series B to Scale Inference InfrastructureDeepInfra has raised $107 million in Series B funding to scale its inference cloud, expand global capacity, and support the next generation of open-source and agentic AI workloads.© 2026 DeepInfra. All rights reserved.