DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing is 50% higher than base model.
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.
Accelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Nano, the newest open reasoning model in the Nemotron family. Our goal is to give developers, researchers, and teams the fastest and simplest path to using Nemotron 3 Nano from day one.
Kimi K2.6 Pricing Guide 2026: Compare Costs & Deployment Strategies<p>Kimi K2.6 matters because it sits in a rare spot: open weights, broad provider availability, and a real spread in pricing and runtime performance depending on where you buy it. Artificial Analysis tracks the model across nine API providers, with blended pricing ranging from $1.15 to $2.15 per 1M tokens and major differences in throughput […]</p>
Gemma 4 Model Overview: Features, Architecture & Use Cases<p>Gemma 4 is Google DeepMind’s latest family of open-weight models, released on April 3, 2026 under the Apache 2.0 license. The family spans four model sizes — from edge-optimized variants for mobile devices to a 31B dense model for server-side deployments — with every model supporting multimodal input, built-in reasoning, and a context window of […]</p>
© 2026 DeepInfra. All rights reserved.