Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!
nvidia/
$0.10
in
$0.40
out
/ 1M tokens
Llama-3.3-Nemotron-Super-49B-v1.5 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements.

© 2026 Deep Infra. All rights reserved.