DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
Gemma 4 is Google DeepMind’s latest family of open-weight models, released on April 3, 2026 under the Apache 2.0 license. The family spans four model sizes — from edge-optimized variants for mobile devices to a 31B dense model for server-side deployments — with every model supporting multimodal input, built-in reasoning, and a context window of […]
Published on 2026.05.25 by DeepInfraGemma 4 26B A4B API Benchmarks: Latency, Throughput & CostAs of May 2026, seven API providers offer access to Gemma 4 26B A4B, and the spread in performance and cost is wide enough to matter in production. Blended pricing ranges from $0.00 (Google AI Studio free tier) to $0.70 per 1M tokens, TTFT spans 0.68s to 5.51s, and output speed varies by nearly 5x […]
Published on 2026.05.25 by DeepInfraGemma 4 Pricing, Benchmarks & Real-World Cost AnalysisGemma 4 puts a serious open-weight reasoning model into a genuinely competitive provider market. The same Gemma 4 26B A4B model is available across seven API providers, with blended pricing ranging from $0.10 to $0.70 per 1M tokens — real variation that changes production economics. Released April 3, 2026 by Google DeepMind under Apache 2.0, […]
Published on 2026.05.25 by DeepInfraGemma 4 on DeepInfra: Fast & Scalable Open AI ModelsGoogle DeepMind’s Gemma 4 scored 88.3% on AIME 2026 mathematics benchmarks in its 26B MoE variant — compared to 20.8% for its predecessor, Gemma 3 27B. That’s not an incremental update. The family spans four model sizes designed for hardware targets as different as a Raspberry Pi and a consumer GPU workstation, with every model […]
Published on 2026.05.25 by DeepInfraBest API Providers for GLM-5.1 in 2026GLM-5.1 is available across a growing number of API providers, and the choice between them materially affects cost, latency, and what features you can actually use. The benchmark spread is real: blended pricing runs from $0.74 to $1.70 per 1M tokens across tracked providers, output speed ranges from 33 to 175 t/s, and not every […]
Published on 2026.05.25 by DeepInfraGLM-5.1 Model Overview: Features, Capabilities & Use CasesGLM-5.1 is Z.AI’s next-generation flagship model for agentic engineering, released on April 7, 2026 under the MIT license. It is a 754-billion parameter Mixture-of-Experts model with 40 billion active parameters per token, a 202,752-token context window, and up to 131K output tokens. The model is the direct successor to GLM-5, designed specifically for long-horizon autonomous […]
Published on 2026.05.25 by DeepInfraGLM-5.1 API Benchmarks: Latency, Throughput & CostZ.ai’s GLM-5.1 is an April 2026 open-weight reasoning model built for long-horizon agentic engineering — and accessing it effectively means navigating a real spread of provider options. Across 10 benchmarked API providers, blended pricing ranges from $0.74 to $1.70 per 1M tokens, output speed from 33.8 to 175.2 t/s, and the fastest provider is 5.2x […]
© 2026 DeepInfra. All rights reserved.