DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing is 50% higher than base model.
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.
Beat AI Subscription Fatigue With One API<p>Open your company card statement and scroll the recurring charges. Twenty dollars for a chat assistant, twenty more for a coding copilot, fifteen for an image API, another forty for the automation glue that wires them together. None of them is expensive on its own. Together they are a slow leak you stopped noticing months […]</p>
DeepSeek’s $10.29B Financing Round Explained<p>DeepSeek has not taken outside money since it was founded in 2023. For two years it turned down every venture capital firm and major tech company that came calling, funding its research entirely from the returns of its parent hedge fund, Zhejiang High-Flyer Asset Management, which reportedly posted a 56.6% return in 2025. That era […]</p>
© 2026 DeepInfra. All rights reserved.