DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

At the end of 2023, the gap between open-weight and closed-source AI models was real and easy to describe. If you wanted the best performance on reasoning, language understanding, or multi-step problem solving, you paid for a proprietary API. Open models were useful, capable for many tasks, and dramatically cheaper to run but they were not considered production-grade alternatives to GPT-4 for anything that required frontier intelligence.
That assessment is no longer accurate. The convergence that practitioners had been observing informally became quantifiable by early 2026: the Stanford AI Index 2025 Report documented that the 17.5 percentage point gap between the best US and Chinese models on MMLU had effectively reached zero. On math benchmarks including MATH-500 and AIME, open models now lead the field outright. On graduate-level science reasoning (GPQA Diamond), they are competitive with all but the most expensive frontier options.
This article covers how that happened, which domains still have a meaningful closed-model advantage, and what it means for teams deciding where to route their workloads.
The trajectory accelerated in phases rather than linearly.
The convergence is most complete on knowledge and reasoning benchmarks that dominated the 2023 and 2024 AI evaluation landscape.
The remaining advantage for closed models is concentrated in a specific set of tasks. It is worth naming them precisely rather than gesturing at a general frontier gap that no longer describes most workloads.
Benchmark convergence understates the actual competitive shift because it does not capture ecosystem dynamics.
Qwen’s 113,000 derivative models on Hugging Face means the Qwen base has been fine-tuned for more specific use cases than any other model family. That kind of derivative ecosystem compounds in ways that benchmark scores cannot capture domain-specific fine-tunes, quantization work, deployment tooling, and community documentation all accumulate on top of a popular base. Alibaba has more derivative models than Google and Meta combined on Hugging Face. That is a structural moat, not just a quality signal.
The same effect applies to infrastructure. Open-weight models can be served by any inference provider, which drives price competition down and availability up. DeepSeek V3 and its variants are available on dozens of providers simultaneously. Closed models are available only from their originating labs or authorized resellers, and pricing reflects that monopoly on supply. For high-volume workloads, the cost differential between open and closed models ranges from 4x to 30x depending on the specific models compared and that gap is structural rather than temporary.
There is also a geographic dimension. Chinese models now dominate open-source download rankings globally. The shift from US-dominant to China-dominant downloads happened in the summer of 2025, according to the ATOM Project’s tracking of Hugging Face data. Whether that represents a long-term reorientation of the ecosystem or a temporary advantage from a wave of competitive releases remains to be seen. But as of mid-2026, the open-source frontier is defined primarily by labs in China, not Silicon Valley.
For most production workloads like document analysis, structured output generation, RAG pipelines, multilingual processing, summarization, classification, the decision between open and closed is no longer a quality decision. It is a cost and control decision. Open models are the economically rational default for these use cases, and the quality difference on specific tasks needs to be measured rather than assumed.
For workloads at the edge of what models can do, closed frontier models maintain a real advantage. That advantage is narrowing with each release cycle, and the lag time between a capability appearing in closed models and a competitive open alternative is measured in months rather than years.
The practical takeaway is that the default assumption should now run in the opposite direction from 2023. The right question is no longer “why would I use an open model?” but “why do I specifically need a closed one?” For a growing share of real workloads, there is no good answer to the second question.
DeepInfra serves the full range of open-weight frontier models discussed here — DeepSeek V4 Pro and Flash, Kimi K2, Qwen3, GLM-5, Llama 4, Gemma 4, and more — with H100-backed infrastructure, low and predictable TTFT, and usage-based pricing with no contracts. DeepSeek V3.2 starts at $0.26 per million input tokens. Kimi K2 at $0.40 per million input and $2.00 per million output. For the broad middle of production AI workloads, that is the math that matters.
Explore all available models: deepinfra.com/models
What Is Google TurboQuant and What Does It Mean for Open Source Inference? - Deep Infra<p>In late March 2026, Google Research published a paper that got more attention outside of academic circles than most AI research does. TurboQuant, a new compression algorithm for the key-value cache in large language models, landed with enough noise that Cloudflare CEO Matthew Prince called it Google’s DeepSeek moment. The Silicon Valley Pied Piper comparisons […]</p>
GLM-4.7-Flash API Benchmarks: Latency, Throughput & Cost<p>About GLM-4.7-Flash GLM-4.7-Flash is Z.AI’s open-weights reasoning model released in January 2026. Built on a Mixture-of-Experts (MoE) Transformer architecture, it features 30 billion total parameters with only ~3 billion active per inference — making it exceptionally efficient for its capability class. The model is designed as a lightweight, cost-effective alternative to Z.AI’s flagship GLM-4.7, optimized […]</p>
© 2026 DeepInfra. All rights reserved.