DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing is 50% higher than base model.
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.
Nemotron 3 Super Provider Pricing Comparison (2026)<p>Nemotron 3 Super is available from multiple providers, and the price spread is real: OpenRouter lists $0.09/$0.45 per 1M input/output tokens, DeepInfra lists $0.10/$0.50, and the Artificial Analysis median across all providers sits at $0.30/$0.75. The right provider depends on what your workload actually looks like — context requirements, output verbosity, and whether you need […]</p>
OpenClaw Use Cases That Deliver Real ROI<p>An OpenClaw agent that reads your email, opens pull requests, and watches a server is only useful if running it doesn’t feel like leaving the meter running. That’s the quiet constraint behind every OpenClaw use cases discussion. Most of the workflows people show off (morning briefings, multi-agent research, ambient monitoring) only make sense if each […]</p>
NVIDIA Nemotron 3 Super on DeepInfra: 120B MoE Model<p>NVIDIA’s Nemotron 3 Super runs 120 billion parameters while activating only 12 billion per token — a ratio that makes a real difference when orchestrating multiple agents in parallel. It’s built on a novel architecture called LatentMoE, a hybrid of Mamba-2, Mixture-of-Experts, and Attention layers designed from the ground up for agentic, reasoning, and long-context […]</p>
© 2026 DeepInfra. All rights reserved.