We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen Model Family

Developed by Alibaba Group's Qwen Team, Qwen is a family of state-of-the-art large language and multimodal models designed for comprehensive AI capabilities and multilingual performance. The latest Qwen3 generation features balanced model architectures including reintroduced Mixture-of-Experts (MoE) variants (Qwen3-30B-A3B and Qwen3-235B-A22B) alongside dense models up to 32B parameters, enabling efficient resource utilization through dynamic parameter activation. With support for 119 languages and dialects, hybrid thinking modes that seamlessly alternate between reasoning and instruction-following without model switching, and extended context windows (up to 1M tokens in Qwen3-2507), Qwen excels in tasks like multilingual understanding, code generation, agentic workflows, and complex problem-solving. The models utilize advanced Byte-level Byte Pair Encoding with a 151,646-token vocabulary, structured ChatML formatting for conversational interactions, and robust tool calling capabilities with parallel execution support. Available in both proprietary and open-weight versions with flexible licensing, comprehensive model variants (Base, Instruct, Thinking, and hybrid modes), and enhanced Model Context Protocol support, Qwen is ideal for developers seeking powerful, multilingual AI systems with sophisticated reasoning capabilities and minimal deployment complexity.

Featured Model: Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

Price per 1M input tokens

$0.30


Price per 1M output tokens

$1.20


Release Date

07/26/2025


Context Size

262,144


Quantization

fp4


# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available Qwen Models

Qwen series offers a comprehensive suite of dense and mixture-of-experts models.

FAQ

What is Qwen AI?

Qwen is a family of state-of-the-art large language and multimodal models developed by Alibaba Group's Qwen Team. Qwen models excel at natural language understanding, text generation, vision understanding, code generation, tool use, role play, and functioning as AI agents. Available in multiple variants (Base, Instruct, Thinking, and hybrid modes), Qwen models are designed for comprehensive AI applications with sophisticated multilingual and multimodal capabilities.

Are the Qwen models on Deepinfra optimized for low latency?

Yes. DeepInfra's autoscaling infrastructure automatically allocates resources based on demand, ensuring optimal performance and minimal cold start times. Choose between OpenAI-compatible APIs for simplicity or dedicated inference endpoints for maximum performance control. In addition, Qwen3's MoE architectures (like Qwen3-30B-A3B and Qwen3-235B-A22B) activate only subsets of parameters per token, reducing computational overhead while maintaining quality.

What's the difference between Qwen embeddings and rerankers?

Qwen embeddings convert text into numerical vectors that capture semantic meaning, enabling fast similarity search across large document collections. Qwen rerankers take a query and a small set of candidate documents and precisely reorder them by relevance. Use embeddings first to quickly find ~100 potential matches, then use rerankers to identify the top 5-10 most relevant results for your specific query.

What tasks are Qwen models best suited for?

  • Multilingual understanding with native support for 119 languages and dialects
  • Code generation & programming, with strengthened Model Context Protocol (MCP) support
  • Optimized for AI agent applications with robust tool calling and parallel execution capabilities
  • Complex reasoning tasks; hybrid thinking modes enable deep problem-solving and chain-of-thought reasoning
  • Multimodal applications: Vision understanding combined with text processing for image analysis and description tasks
  • Conversational AI with structured ChatML formatting for natural dialogue interactions
  • Flexible system message support for various persona-based applications (role playing and character AI)
  • Long-form content with extended context windows which support lengthy document processing and generation

How do I integrate Qwen models into my application?

You can integrate Qwen models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.

What are the pricing details for using Qwen models on DeepInfra?

Pricing is usage-based:
  • Input Tokens: between $0.04 and $0.40 per million
  • Output Tokens: between $0.10 and $1.60 per million
Prices vary slightly by model. There are no upfront fees, and you only pay for what you use.

How do I get started using Qwen on DeepInfra?

Sign in with GitHub at deepinfra.com
  • Get your API key
  • Test models directly from the browser, cURL, or SDKs
  • Review pricing on your usage dashboard
Within minutes, you can deploy apps using Qwen models—without any infrastructure setup.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.