Nemotron 3 Nano Omni — the first multimodal model in the Nemotron 3 family, now on DeepInfra!
Published on 2026.04.03 by DeepInfraKimi K2.5 API Benchmarks: Latency, Throughput & CostAbout Kimi K2.5 Kimi K2.5 is Moonshot AI’s flagship open-source reasoning model, released in January 2026. It is a native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens. The model features a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion activated parameters. Kimi K2.5 […]
Published on 2026.04.03 by DeepInfraMiniMax-M2.5 API Benchmarks: Latency, Throughput & CostAbout MiniMax-M2.5 MiniMax-M2.5 is a state-of-the-art open-weights large language model released in February 2026. Built on a 230B-parameter Mixture of Experts (MoE) architecture with approximately 10 billion active parameters per forward pass, it features Lightning Attention and supports a context window of up to 205,000 tokens. The model uses extended chain-of-thought reasoning to work through […]
Published on 2026.04.03 by DeepInfraGLM-5 API Benchmarks: Latency, Throughput & CostGLM-5 is the latest open-weights reasoning model released by Z AI (Zhipu AI) in February 2026, characterized by high “thinking token” usage. It is a Mixture of Experts (MoE) model with 744B total parameters and 40B active parameters, scaling up from GLM-4.5’s 355B parameters. The model was pre-trained on 28.5T tokens and features a 200K+ […]
Published on 2026.03.11 by Aray SultanbekovaIntroducing Nemotron 3 Super on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Super, the latest open model in the Nemotron family, purpose-built for complex multi-agent applications with a 1M token context window and hybrid MoE architecture.
Published on 2026.02.12 by DeepInfraBuilding Efficient AI Inference on NVIDIA Blackwell PlatformDeepInfra delivers up to 20x cost reductions on NVIDIA Blackwell by combining MoE architectures, NVFP4 quantization, and inference optimizations — with a Latitude case study.
Published on 2026.02.02 by DeepInfraFunction Calling in DeepInfra: Extend Your AI with Real-World LogicModern large language models (LLMs) are incredibly powerful at understanding and generating text, but until recently they were largely static: they could only respond based on patterns in their training data. Function calling changes that. It lets language models interact with external logic — your own code, APIs, utilities, or business systems — while still […]
© 2026 Deep Infra. All rights reserved.