🚀 New models by Bria.ai, generate and edit images at scale 🚀
Developed by Alibaba Group's Qwen Team, Qwen is a family of state-of-the-art large language and multimodal models designed for comprehensive AI capabilities and multilingual performance. The latest Qwen3 generation features balanced model architectures including reintroduced Mixture-of-Experts (MoE) variants (Qwen3-30B-A3B and Qwen3-235B-A22B) alongside dense models up to 32B parameters, enabling efficient resource utilization through dynamic parameter activation.
With support for 119 languages and dialects, hybrid thinking modes that seamlessly alternate between reasoning and instruction-following without model switching, and extended context windows (up to 1M tokens in Qwen3-2507), Qwen excels in tasks like multilingual understanding, code generation, agentic workflows, and complex problem-solving. The models utilize advanced Byte-level Byte Pair Encoding with a 151,646-token vocabulary, structured ChatML formatting for conversational interactions, and robust tool calling capabilities with parallel execution support.
Available in both proprietary and open-weight versions with flexible licensing, comprehensive model variants (Base, Instruct, Thinking, and hybrid modes), and enhanced Model Context Protocol support, Qwen is ideal for developers seeking powerful, multilingual AI systems with sophisticated reasoning capabilities and minimal deployment complexity.
Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.
Price per 1M input tokens
$0.29
Price per 1M output tokens
$1.20
Release Date
07/26/2025
Context Size
262,144
Quantization
fp4
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
DeepInfra provides access to Qwen's latest generation of large language models, offering both specialized coding models and general-purpose AI systems with advanced reasoning capabilities.
Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
---|---|---|---|---|
Qwen3-Coder-480B-A35B-Instruct-Turbo | 256k | $0.29 | $1.20 | |
Qwen3-Coder-480B-A35B-Instruct | 256k | $0.40 | $1.60 | |
Qwen3-235B-A22B-Thinking-2507 | 256k | $0.30 | $2.90 | |
Qwen3-235B-A22B-Instruct-2507 | 256k | $0.09 | $0.60 | |
QwQ-32B | 128k | $0.15 | $0.40 | |
Qwen3-235B-A22B | 40k | $0.18 | $0.54 | |
Qwen3-32B | 40k | $0.10 | $0.28 | |
Qwen3-30B-A3B | 40k | $0.08 | $0.29 | |
Qwen3-14B | 40k | $0.06 | $0.24 | |
Qwen2.5-72B-Instruct | 32k | $0.12 | $0.39 | |
Qwen2.5-Coder-32B-Instruct | 32k | $0.06 | $0.15 | |
Qwen2.5-7B-Instruct | 32k | $0.04 | $0.10 |
Qwen is a family of state-of-the-art large language and multimodal models developed by Alibaba Group's Qwen Team. Qwen models excel at natural language understanding, text generation, vision understanding, code generation, tool use, role play, and functioning as AI agents. Available in multiple variants (Base, Instruct, Thinking, and hybrid modes), Qwen models are designed for comprehensive AI applications with sophisticated multilingual and multimodal capabilities.
Yes. DeepInfra's autoscaling infrastructure automatically allocates resources based on demand, ensuring optimal performance and minimal cold start times. Choose between OpenAI-compatible APIs for simplicity or dedicated inference endpoints for maximum performance control. In addition, Qwen3's MoE architectures (like Qwen3-30B-A3B and Qwen3-235B-A22B) activate only subsets of parameters per token, reducing computational overhead while maintaining quality.
Qwen embeddings convert text into numerical vectors that capture semantic meaning, enabling fast similarity search across large document collections. Qwen rerankers take a query and a small set of candidate documents and precisely reorder them by relevance.
Use embeddings first to quickly find ~100 potential matches, then use rerankers to identify the top 5-10 most relevant results for your specific query.
© 2025 Deep Infra. All rights reserved.