We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Gemma 4 Model Overview: Features, Architecture & Use CasesLatest article
Published on 2026.05.25 by DeepInfraGemma 4 Model Overview: Features, Architecture & Use Cases

Gemma 4 is Google DeepMind’s latest family of open-weight models, released on April 3, 2026 under the Apache 2.0 license. The family spans four model sizes — from edge-optimized variants for mobile devices to a 31B dense model for server-side deployments — with every model supporting multimodal input, built-in reasoning, and a context window of […]

Recent articles
Gemma 4 26B A4B API Benchmarks: Latency, Throughput & CostPublished on 2026.05.25 by DeepInfraGemma 4 26B A4B API Benchmarks: Latency, Throughput & Cost

As of May 2026, seven API providers offer access to Gemma 4 26B A4B, and the spread in performance and cost is wide enough to matter in production. Blended pricing ranges from $0.00 (Google AI Studio free tier) to $0.70 per 1M tokens, TTFT spans 0.68s to 5.51s, and output speed varies by nearly 5x […]

Gemma 4 Pricing, Benchmarks & Real-World Cost AnalysisPublished on 2026.05.25 by DeepInfraGemma 4 Pricing, Benchmarks & Real-World Cost Analysis

Gemma 4 puts a serious open-weight reasoning model into a genuinely competitive provider market. The same Gemma 4 26B A4B model is available across seven API providers, with blended pricing ranging from $0.10 to $0.70 per 1M tokens — real variation that changes production economics. Released April 3, 2026 by Google DeepMind under Apache 2.0, […]

Gemma 4 on DeepInfra: Fast & Scalable Open AI ModelsPublished on 2026.05.25 by DeepInfraGemma 4 on DeepInfra: Fast & Scalable Open AI Models

Google DeepMind’s Gemma 4 scored 88.3% on AIME 2026 mathematics benchmarks in its 26B MoE variant — compared to 20.8% for its predecessor, Gemma 3 27B. That’s not an incremental update. The family spans four model sizes designed for hardware targets as different as a Raspberry Pi and a consumer GPU workstation, with every model […]

Best API Providers for GLM-5.1 in 2026Published on 2026.05.25 by DeepInfraBest API Providers for GLM-5.1 in 2026

GLM-5.1 is available across a growing number of API providers, and the choice between them materially affects cost, latency, and what features you can actually use. The benchmark spread is real: blended pricing runs from $0.74 to $1.70 per 1M tokens across tracked providers, output speed ranges from 33 to 175 t/s, and not every […]

GLM-5.1 Model Overview: Features, Capabilities & Use CasesPublished on 2026.05.25 by DeepInfraGLM-5.1 Model Overview: Features, Capabilities & Use Cases

GLM-5.1 is Z.AI’s next-generation flagship model for agentic engineering, released on April 7, 2026 under the MIT license. It is a 754-billion parameter Mixture-of-Experts model with 40 billion active parameters per token, a 202,752-token context window, and up to 131K output tokens. The model is the direct successor to GLM-5, designed specifically for long-horizon autonomous […]

GLM-5.1 API Benchmarks: Latency, Throughput & CostPublished on 2026.05.25 by DeepInfraGLM-5.1 API Benchmarks: Latency, Throughput & Cost

Z.ai’s GLM-5.1 is an April 2026 open-weight reasoning model built for long-horizon agentic engineering — and accessing it effectively means navigating a real spread of provider options. Across 10 benchmarked API providers, blended pricing ranges from $0.74 to $1.70 per 1M tokens, output speed from 33.8 to 175.2 t/s, and the fastest provider is 5.2x […]