GLM-5.1 - state-of-the-art agentic engineering, now available on DeepInfra!

You can browse available text-to-image models on the models page.
For example, we'll use runwayml/stable-diffusion-v1-5.
curl -X POST \
-d '{"prompt": "A photo of a cube floating in space"}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer YOUR_API_KEY" \
-o cube.jpg \
'https://api.deepinfra.com/v1/inference/runwayml/stable-diffusion-v1-5'
And check out the output in cube.jpg.
You can check all the available settings on the model page or via the API documentation tab.
MiniMax-M2.5 API Benchmarks: Latency, Throughput & Cost<p>About MiniMax-M2.5 MiniMax-M2.5 is a state-of-the-art open-weights large language model released in February 2026. Built on a 230B-parameter Mixture of Experts (MoE) architecture with approximately 10 billion active parameters per forward pass, it features Lightning Attention and supports a context window of up to 205,000 tokens. The model uses extended chain-of-thought reasoning to work through […]</p>
NVIDIA Nemotron 3 Nano 30B API Benchmarks: Latency & Cost<p>About NVIDIA Nemotron 3 Nano 30B A3B NVIDIA Nemotron 3 Nano 30B A3B is a large language model trained from scratch by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It is part of the Nemotron 3 family — NVIDIA’s most efficient family of open models, built for agentic AI applications. […]</p>
Qwen3.5 4B via DeepInfra: Latency, Throughput & Cost<p>About Qwen3.5 4B (Reasoning) Qwen3.5 4B is a compact 4-billion parameter open-weights model released in March 2026 as part of Alibaba Cloud’s Qwen3.5 Small Model Series. It employs an Efficient Hybrid Architecture combining Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts, delivering high-throughput inference with minimal latency overhead — a significant architectural […]</p>
© 2026 Deep Infra. All rights reserved.