DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
nvidia/
$0.50
in
$2.20
out
$0.10
cached
/ 1M tokens
| Tier | Input | Output | Cached input |
|---|---|---|---|
Priority (1.5×)Learn More | $0.75 | $3.30 | $0.15 |
per 1M tokens
Nemotron 3 Ultra is built for, frontier reasoning, orchestration, coding agents, deep research, and complex enterprise workflows. It delivers up to 5x faster inference and up to 30% lower cost for agentic workloads while supporting up to 1M token context.

Ask me anything
Settings
© 2026 DeepInfra. All rights reserved.