DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
nvidia/
$0.05
in
$0.20
out
/ 1M tokens
NVIDIA Nemotron 3 Nano is an open small reasoning model optimized for fast, cost-efficient inference in agentic and production workloads. Built with a hybrid Mixture-of-Experts (MoE) and Mamba-Transformer architecture, it delivers strong multi-step reasoning, high token throughput, stable latency with predictable cost, and efficient deployment for agent-based systems. Designed for real-world AI systems where reasoning can generate significantly more tokens per prompt, Nemotron Nano reduces compute cost while maintaining strong reasoning quality.

Ask me anything
Settings
© 2026 DeepInfra. All rights reserved.