We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

text-generation

MiniMaxAI/MiniMax-M2.5 cover image

MiniMax M2.5 is SOTA in coding, agentic tool use and search, office work, and a range of other economically valuable tasks, boasting scores of 80.2% in SWE-Bench Verified, 51.3% in Multi-SWE-Bench, and 76.3% in BrowseComp (with context management).

$0.03 cached, $0.27 in, $0.95 out / 1M

featured

text-generation

moonshotai/Kimi-K2.5 cover image

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

$0.07 cached, $0.45 in, $2.25 out / 1M

featured

text-generation

zai-org/GLM-4.7-Flash cover image

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

$0.01 cached, $0.06 in, $0.40 out / 1M

featured

text-generation

Nemotron-3-Nano-30B-A3B

nvidia/Nemotron-3-Nano-30B-A3B cover image

NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.

$0.05 in, $0.20 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3.2 cover image

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

$0.13 cached, $0.26 in, $0.38 out / 1M

text-generation

ByteDance/Seed-1.8 cover image

Optimized specifically for multimodal agent scenarios. It features enhanced agent capabilities, upgraded multimodal comprehension, and more flexible context management.

$0.05 cached, $0.25 in, $2.00 out / 1M

text-generation

ByteDance/Seed-2.0-mini cover image

Built for low-latency, high-concurrency, cost-sensitive use cases, with flexible deployment, four-tier thinking, and multimodal

$0.02 cached, $0.10 in, $0.40 out / 1M

text-generation

MythoMax-L2-13b

Gryphe/MythoMax-L2-13b cover image

$0.40 / 1M tokens

text-generation

MiniMaxAI/MiniMax-M2.1 cover image

MiniMax-M2.1 is a model optimized specifically for robustness in coding, tool use, instruction following, and long-horizon planning. From automating multilingual software development to executing complex, multi-step office workflows, MiniMax-M2.1 empowers developers to build the next generation of autonomous applications—all while being fully transparent, controllable, and accessible.

$0.03 cached, $0.27 in, $0.95 out / 1M

text-generation

Hermes-3-Llama-3.1-405B

NousResearch/Hermes-3-Llama-3.1-405B cover image

Hermes 3 is a cutting-edge language model that offers advanced capabilities in roleplaying, reasoning, and conversation. It's a fine-tuned version of the Llama-3.1 405B foundation model, designed to align with user needs and provide powerful control. Key features include reliable function calling, structured output, generalist assistant capabilities, and improved code generation. Hermes 3 is competitive with Llama-3.1 Instruct models, with its own strengths and weaknesses.

$1.00 / 1M tokens

text-generation

Hermes-3-Llama-3.1-70B

NousResearch/Hermes-3-Llama-3.1-70B cover image

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

$0.30 / 1M tokens

text-generation

PaddleOCR-VL-0.9B

PaddlePaddle/PaddleOCR-VL-0.9B cover image

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.

$0.14 in, $0.80 out / 1M

text-generation

Qwen2.5-72B-Instruct

Qwen/Qwen2.5-72B-Instruct cover image

Qwen2.5 is a model pretrained on a large-scale dataset of up to 18 trillion tokens, offering significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor Qwen2. The model also features enhanced capabilities in generating long texts, understanding structured data, and generating structured outputs, while supporting multilingual capabilities for over 29 languages.

$0.12 in, $0.39 out / 1M

text-generation

Qwen2.5-VL-32B-Instruct

Qwen/Qwen2.5-VL-32B-Instruct cover image

$0.20 in, $0.60 out / 1M

text-generation

Qwen/Qwen3-14B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

$0.12 in, $0.24 out / 1M

text-generation

Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507 cover image

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

$0.071 in, $0.10 out / 1M

text-generation

Qwen3-235B-A22B-Thinking-2507

Qwen/Qwen3-235B-A22B-Thinking-2507 cover image

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

$0.20 cached, $0.23 in, $2.30 out / 1M

text-generation

Qwen/Qwen3-30B-A3B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.08 in, $0.28 out / 1M

text-generation

Qwen/Qwen3-32B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.08 in, $0.28 out / 1M

text-generation

Qwen3-Coder-480B-A35B-Instruct

Qwen/Qwen3-Coder-480B-A35B-Instruct cover image

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.40 in, $1.60 out / 1M

text-generation

Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo cover image

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.022 cached, $0.22 in, $1.00 out / 1M

text-generation

Qwen/Qwen3-Max cover image

The latest flagship model in the Qwen family. State-of-the-art results across a comprehensive suite of benchmarks — including knowledge, reasoning, coding, instruction following, human preference alignment, agent tasks, and multilingual understanding.

$1.20 in $6.00 out $0.24 cached / 1M tokens

text-generation

Qwen3-Max-Thinking

Qwen/Qwen3-Max-Thinking cover image

The latest flagship reasoning model in the Qwen3 family. Further enhanced by multiple innovations like adaptive tool-use and advanced test-time scaling techniques

$1.20 in $6.00 out $0.24 cached / 1M tokens

text-generation

Qwen3-Next-80B-A3B-Instruct

Qwen/Qwen3-Next-80B-A3B-Instruct cover image

Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.

$0.09 in, $1.10 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.2-Exp deepseek-ai/DeepSeek-V3.1 anthropic/claude-3-7-sonnet-latest moonshotai/Kimi-K2-Instruct-0905 zai-org/GLM-4.6

Featured Models

nvidia/Nemotron-3-Nano-30B-A3B MiniMaxAI/MiniMax-M2.5 Bria/video_mask_by_key_points Bria/video_foreground_mask PrunaAI/p-image-Edit

Built With Love in Palo Alto

© 2026 Deep Infra. All rights reserved.

Privacy Policy Terms of Service