We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Qwen Model Family

Developed by Alibaba Group's Qwen Team, Qwen is a family of state-of-the-art large language and multimodal models designed for comprehensive AI capabilities and multilingual performance. The latest Qwen3 generation features balanced model architectures including reintroduced Mixture-of-Experts (MoE) variants (Qwen3-30B-A3B and Qwen3-235B-A22B) alongside dense models up to 32B parameters, enabling efficient resource utilization through dynamic parameter activation.

With support for 119 languages and dialects, hybrid thinking modes that seamlessly alternate between reasoning and instruction-following without model switching, and extended context windows (up to 1M tokens in Qwen3-2507), Qwen excels in tasks like multilingual understanding, code generation, agentic workflows, and complex problem-solving. The models utilize advanced Byte-level Byte Pair Encoding with a 151,646-token vocabulary, structured ChatML formatting for conversational interactions, and robust tool calling capabilities with parallel execution support.

Available in both proprietary and open-weight versions with flexible licensing, comprehensive model variants (Base, Instruct, Thinking, and hybrid modes), and enhanced Model Context Protocol support, Qwen is ideal for developers seeking powerful, multilingual AI systems with sophisticated reasoning capabilities and minimal deployment complexity.

Featured Model: Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

Price per 1M input tokens

$0.29

Price per 1M output tokens

$1.20

Release Date

07/26/2025

Context Size

262,144

Quantization

fp4

# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available Qwen Models

DeepInfra provides access to Qwen's latest generation of large language models, offering both specialized coding models and general-purpose AI systems with advanced reasoning capabilities.

Model	Context	$ per 1M input tokens	$ per 1M output tokens	Actions
Qwen3-Next-80B-A3B-Instruct	256k	$0.14	$1.10	View more
Qwen3-Coder-480B-A35B-Instruct-Turbo	256k	$0.29	$1.20	View more
Qwen3-Coder-480B-A35B-Instruct	256k	$0.40	$1.60	View more
Qwen3-235B-A22B-Thinking-2507	256k	$0.30	$2.90	View more
Qwen3-235B-A22B-Instruct-2507	256k	$0.09	$0.57	View more
Qwen3-32B	40k	$0.10	$0.28	View more
Qwen3-30B-A3B	40k	$0.08	$0.29	View more
Qwen3-14B	40k	$0.08	$0.24	View more
Qwen2.5-72B-Instruct	32k	$0.12	$0.39	View more

FAQ

What is Qwen AI?

Qwen is a family of state-of-the-art large language and multimodal models developed by Alibaba Group's Qwen Team. Qwen models excel at natural language understanding, text generation, vision understanding, code generation, tool use, role play, and functioning as AI agents. Available in multiple variants (Base, Instruct, Thinking, and hybrid modes), Qwen models are designed for comprehensive AI applications with sophisticated multilingual and multimodal capabilities.

What tasks are Qwen models best suited for?

Multilingual understanding with native support for 119 languages and dialects
Code generation & programming, with strengthened Model Context Protocol (MCP) support
Optimized for AI agent applications with robust tool calling and parallel execution capabilities
Complex reasoning tasks; hybrid thinking modes enable deep problem-solving and chain-of-thought reasoning
Multimodal applications: Vision understanding combined with text processing for image analysis and description tasks
Conversational AI with structured ChatML formatting for natural dialogue interactions
Flexible system message support for various persona-based applications (role playing and character AI)
Long-form content with extended context windows which support lengthy document processing and generation

Are the Qwen models on Deepinfra optimized for low latency?

Yes. DeepInfra's autoscaling infrastructure automatically allocates resources based on demand, ensuring optimal performance and minimal cold start times. Choose between OpenAI-compatible APIs for simplicity or dedicated inference endpoints for maximum performance control. In addition, Qwen3's MoE architectures (like Qwen3-30B-A3B and Qwen3-235B-A22B) activate only subsets of parameters per token, reducing computational overhead while maintaining quality.

What's the difference between Qwen embeddings and rerankers?

Qwen embeddings convert text into numerical vectors that capture semantic meaning, enabling fast similarity search across large document collections. Qwen rerankers take a query and a small set of candidate documents and precisely reorder them by relevance.

Use embeddings first to quickly find ~100 potential matches, then use rerankers to identify the top 5-10 most relevant results for your specific query.

How do I integrate Qwen models into my application?

You can integrate Qwen models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.