Developed by Alibaba Group's Qwen Team, Qwen is a family of state-of-the-art large language and multimodal models designed for comprehensive AI capabilities and multilingual performance. The latest Qwen3 generation features balanced model architectures including reintroduced Mixture-of-Experts (MoE) variants (Qwen3-30B-A3B and Qwen3-235B-A22B) alongside dense models up to 32B parameters, enabling efficient resource utilization through dynamic parameter activation. With support for 119 languages and dialects, hybrid thinking modes that seamlessly alternate between reasoning and instruction-following without model switching, and extended context windows (up to 1M tokens in Qwen3-2507), Qwen excels in tasks like multilingual understanding, code generation, agentic workflows, and complex problem-solving. The models utilize advanced Byte-level Byte Pair Encoding with a 151,646-token vocabulary, structured ChatML formatting for conversational interactions, and robust tool calling capabilities with parallel execution support. Available in both proprietary and open-weight versions with flexible licensing, comprehensive model variants (Base, Instruct, Thinking, and hybrid modes), and enhanced Model Context Protocol support, Qwen is ideal for developers seeking powerful, multilingual AI systems with sophisticated reasoning capabilities and minimal deployment complexity.
Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.
Price per 1M input tokens
$0.30
Price per 1M output tokens
$1.20
Release Date
07/26/2025
Context Size
262,144
Quantization
fp4
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
Qwen series offers a comprehensive suite of dense and mixture-of-experts models.
Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
---|---|---|---|---|
Qwen3-Coder-480B-A35B-Instruct-Turbo | 256k | $0.30 | $1.20 | |
Qwen3-Coder-480B-A35B-Instruct | 256k | $0.40 | $1.60 | |
Qwen3-235B-A22B-Thinking-2507 | 256k | $0.13 | $0.60 | |
Qwen3-235B-A22B-Instruct-2507 | 256k | $0.13 | $0.60 | |
QwQ-32B | 128k | $0.075 | $0.15 | |
Qwen3-235B-A22B | 40k | $0.13 | $0.60 | |
Qwen3-32B | 40k | $0.10 | $0.30 | |
Qwen3-30B-A3B | 40k | $0.08 | $0.29 | |
Qwen3-14B | 40k | $0.06 | $0.24 | |
Qwen2.5-72B-Instruct | 32k | $0.12 | $0.39 | |
Qwen2.5-Coder-32B-Instruct | 32k | $0.06 | $0.15 | |
Qwen2.5-7B-Instruct | 32k | $0.04 | $0.10 |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.