We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Gemma 4 Pricing, Benchmarks & Real-World Cost AnalysisPublished on 2026.05.25 by DeepInfraGemma 4 Pricing, Benchmarks & Real-World Cost Analysis

Gemma 4 puts a serious open-weight reasoning model into a genuinely competitive provider market. The same Gemma 4 26B A4B model is available across seven API providers, with blended pricing ranging from $0.10 to $0.70 per 1M tokens — real variation that changes production economics. Released April 3, 2026 by Google DeepMind under Apache 2.0, […]

Gemma 4 on DeepInfra: Fast & Scalable Open AI ModelsPublished on 2026.05.25 by DeepInfraGemma 4 on DeepInfra: Fast & Scalable Open AI Models

Google DeepMind’s Gemma 4 scored 88.3% on AIME 2026 mathematics benchmarks in its 26B MoE variant — compared to 20.8% for its predecessor, Gemma 3 27B. That’s not an incremental update. The family spans four model sizes designed for hardware targets as different as a Raspberry Pi and a consumer GPU workstation, with every model […]

Best API Providers for GLM-5.1 in 2026Published on 2026.05.25 by DeepInfraBest API Providers for GLM-5.1 in 2026

GLM-5.1 is available across a growing number of API providers, and the choice between them materially affects cost, latency, and what features you can actually use. The benchmark spread is real: blended pricing runs from $0.74 to $1.70 per 1M tokens across tracked providers, output speed ranges from 33 to 175 t/s, and not every […]

GLM-5.1 Model Overview: Features, Capabilities & Use CasesPublished on 2026.05.25 by DeepInfraGLM-5.1 Model Overview: Features, Capabilities & Use Cases

GLM-5.1 is Z.AI’s next-generation flagship model for agentic engineering, released on April 7, 2026 under the MIT license. It is a 754-billion parameter Mixture-of-Experts model with 40 billion active parameters per token, a 202,752-token context window, and up to 131K output tokens. The model is the direct successor to GLM-5, designed specifically for long-horizon autonomous […]

GLM-5.1 API Benchmarks: Latency, Throughput & CostPublished on 2026.05.25 by DeepInfraGLM-5.1 API Benchmarks: Latency, Throughput & Cost

Z.ai’s GLM-5.1 is an April 2026 open-weight reasoning model built for long-horizon agentic engineering — and accessing it effectively means navigating a real spread of provider options. Across 10 benchmarked API providers, blended pricing ranges from $0.74 to $1.70 per 1M tokens, output speed from 33.8 to 175.2 t/s, and the fastest provider is 5.2x […]

GLM-5.1 Pricing Guide: API Cost Comparison & AnalysisPublished on 2026.05.25 by DeepInfraGLM-5.1 Pricing Guide: API Cost Comparison & Analysis

Provider choice for GLM-5.1 is a real economic decision. Across 10 benchmarked API providers, blended pricing runs from $0.74 to $1.70 per 1M tokens, output speed from 33.8 to 175.2 t/s, and the fastest provider is 5.2x quicker than the slowest. For teams deploying at scale, that spread determines whether this model fits a production […]