DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing is 50% higher than base model.
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.
Nemotron 3 Nano Explained: NVIDIA’s Efficient Small LLM and Why It Matters<p>The open-source LLM space has exploded with models competing across size, efficiency, and reasoning capability. But while frontier models dominate headlines with enormous parameter counts, a different category has quietly become essential for real-world deployment: small yet high-performance models optimized for edge devices, private on-prem systems, and cost-sensitive applications. NVIDIA’s Nemotron family brings together open […]</p>
NVIDIA Nemotron 3 Nano 30B API Benchmarks: Latency & Cost<p>About NVIDIA Nemotron 3 Nano 30B A3B NVIDIA Nemotron 3 Nano 30B A3B is a large language model trained from scratch by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It is part of the Nemotron 3 family — NVIDIA’s most efficient family of open models, built for agentic AI applications. […]</p>
Chat with books using DeepInfra and LlamaIndexAs DeepInfra, we are excited to announce our integration with LlamaIndex.
LlamaIndex is a powerful library that allows you to index and search documents
using various language models and embeddings. In this blog post, we will show
you how to chat with books using DeepInfra and LlamaIndex.
We will ...© 2026 DeepInfra. All rights reserved.