QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
QwQ-32B
Ask me anything
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
This repo contains the QwQ 32B model, which has the following features:
Note: For the best experience, please review the usage guidelines before deploying QwQ models.
For more details, please refer to our blog, GitHub, and Documentation.
To achieve optimal performance, we recommend the following settings:
Enforce Thoughtful Output: Ensure the model starts with "<think>\n" to prevent generating empty thinking content, which can degrade output quality. If you use apply_chat_template
and set add_generation_prompt=True
, this is already automatically implemented, but it may cause the response to lack the <think> tag at the beginning. This is normal behavior.
Sampling Parameters:
No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in apply_chat_template
.
Standardize Output Format: We recommend using prompts to standardize model outputs when benchmarking.
answer
field with only the choice letter, e.g.,\"answer\": \"C\"
." in the prompt.Detailed evaluation results are reported in this 📑 blog.
For requirements on GPU memory and the respective throughput, see results here.