NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!
Published on 2026.04.03 by DeepInfraQwen3.5 9B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 9B Qwen3.5 9B is the flagship of Alibaba’s Qwen3.5 Small Model Series, released on March 2, 2026. It is a dense multimodal model combining Gated Delta Networks (a form of linear attention) with a sparse Mixture-of-Experts system, enabling higher throughput and lower latency during inference compared to traditional dense architectures. The architecture utilizes […]
Published on 2026.04.03 by DeepInfraQwen3.5 27B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 27B (Reasoning) Qwen3.5 27B is part of Alibaba Cloud’s latest-generation foundation model family, released in February 2026. Unlike the Mixture-of-Experts variants in the Qwen3.5 series, the 27B model uses a dense architecture combining Gated Delta Networks and Feed Forward Networks. It achieves strong benchmark scores including MMLU-Pro (86.1%), GPQA Diamond (85.5%), and SWE-bench […]
Published on 2026.04.03 by DeepInfraQwen3.5 35B A3B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 35B A3B Qwen3.5 35B A3B is a native vision-language model released by Alibaba Cloud in February 2026. It uses a hybrid architecture that integrates Gated Delta Networks with a sparse Mixture-of-Experts model, achieving higher inference efficiency. With 35 billion total parameters and only 3 billion activated per token through 256 experts (8 routed […]
Published on 2026.04.03 by DeepInfraQwen3.5 122B A10B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 122B A10B Qwen3.5 122B A10B is Alibaba Cloud’s mid-tier multimodal foundation model, released in February 2026. It is a multimodal vision-language Mixture-of-Experts model supporting text, image, and video inputs, designed for native multimodal agent applications. It features 122 billion total parameters with 10 billion activated per token through a hybrid architecture that integrates […]
Published on 2026.04.03 by DeepInfraQwen3.5 397B A17B API Benchmarks: Latency, Throughput & CostAbout Qwen3.5 397B A17B Qwen3.5 397B A17B is Alibaba Cloud’s largest and most capable multimodal foundation model, released in February 2026. It features a hybrid Mixture-of-Experts (MoE) architecture with 397 billion total parameters and 17 billion active parameters per inference pass, utilizing 512 experts with a routing mechanism selecting a subset per token. This sparse […]
Published on 2026.04.03 by DeepInfraStep 3.5 Flash API Benchmarks: Latency, Throughput & CostAbout Step 3.5 Flash Step 3.5 Flash is an open-weights reasoning model released in February 2026 by StepFun. It leverages a sparse Mixture of Experts (MoE) architecture with 196 billion total parameters and only 11 billion active parameters per token during inference — delivering state-of-the-art performance at a fraction of the cost of dense models. […]
© 2026 Deep Infra. All rights reserved.