We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

A Milestone on Our Journey Building Deep Infra and Scaling Open Source AI Infrastructure
Published on 2025.04.22 by Yessen Kanapin, Co-Founder of DeepInfra
A Milestone on Our Journey Building Deep Infra and Scaling Open Source AI Infrastructure

Today we're excited to share that Deep Infra has raised $18 million in Series A funding, led by Felicis and our earliest believer and advisor Georges Harik.

When we founded Deep Infra in 2022, we saw a clear gap: while enormous resources were being poured into training AI models, the infrastructure needed to run these models in production was lagging behind.

The past two years have been a whirlwind. We've scaled our processing volume by over 8,000x since our seed stage. What started as a bet on AI infrastructure has quickly become a critical service for developers deploying increasingly sophisticated models.

Our growth accelerated following the emergence of "thinking models" like DeepSeek. These open source alternatives demonstrated that the innovation cycle in AI was becoming even more rapid than anticipated, requiring significantly more computation during inference.

The reality of deploying modern AI models is challenging for most organizations. Running these models requires significant compute resources, specialized hardware like GPUs that are difficult to acquire, and deep expertise in infrastructure optimization. Most companies simply can't afford the investment or overcome the supply chain challenges to build this infrastructure themselves.

This challenge has shaped our approach from day one. After years of scaling systems to hundreds of millions of users before founding this company, we've developed a set of core principles that guide how we build Deep Infra:

  1. We believe reliability is non-negotiable when your service powers critical applications. We design for zero downtime because AI infrastructure must be as dependable as the electricity powering your office.
  2. We've learned that performance creates competitive advantage. Our obsession with fast time-to-first-token and optimal GPU utilization isn't technical vanity – it directly impacts our customers' user experience and cost structure.
  3. We're convinced that privacy builds lasting trust. Our strict no-logging policy for user prompts isn't just a feature – it's a fundamental commitment to our customers' data sovereignty.
  4. And we know that deep infrastructure expertise matters at every layer. Understanding the full stack from hardware to application allows us to deliver superior performance while controlling costs.

These principles have guided our approach as we've expanded our computing capacity, recently receiving a large shipment of NVIDIA Blackwell GPUs with more on order to support our rapid growth. You can see how this funding injection will be put to good use.

To our customers who have trusted us with their production workloads: thank you. We're just getting started as we continue building the infrastructure that powers the next generation of AI applications.

Follow us on X (formerly Twitter) and LinkedIn to stay updated on our journey. We look forward to sharing more exciting developments in the coming months.

Servers

Related articles
Chat with books using DeepInfra and LlamaIndexChat with books using DeepInfra and LlamaIndexAs DeepInfra, we are excited to announce our integration with LlamaIndex. LlamaIndex is a powerful library that allows you to index and search documents using various language models and embeddings. In this blog post, we will show you how to chat with books using DeepInfra and LlamaIndex. We will ...
Qwen API Pricing Guide 2026: Max Performance on a BudgetQwen API Pricing Guide 2026: Max Performance on a Budget<p>If you have been following the AI leaderboards lately, you have likely noticed a new name constantly trading blows with GPT-4o and Claude 3.5 Sonnet: Qwen. Developed by Alibaba Cloud, the Qwen model family (specifically Qwen 2.5 and Qwen 3) has exploded in popularity for one simple reason: unbeatable price-to-performance. In 2025, Qwen is widely [&hellip;]</p>
Nemotron 3 Nano vs GPT-OSS-20B: Performance, Benchmarks & DeepInfra ResultsNemotron 3 Nano vs GPT-OSS-20B: Performance, Benchmarks & DeepInfra Results<p>The open-source LLM landscape is becoming increasingly diverse, with models optimized for reasoning, throughput, cost-efficiency, and real-world agentic applications. Two models that stand out in this new generation are NVIDIA’s Nemotron 3 Nano and OpenAI’s GPT-OSS-20B, both of which offer strong performance while remaining openly available and deployable across cloud and edge systems. Although both [&hellip;]</p>