Qwen/Qwen2.5-Coder-32B-Instruct cover image
featured

Qwen/Qwen2.5-Coder-32B-Instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It has significant improvements in code generation, code reasoning and code fixing. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It has significant improvements in code generation, code reasoning and code fixing. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

Public
$0.18 / Mtoken
bfloat16
32,768
ProjectLicense
Qwen/Qwen2.5-Coder-32B-Instruct cover image

Qwen2.5 Coder 32b

Ask me anything

0.00s

Qwen2.5-Coder-32B-Instruct

Introduction

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

  • Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
  • A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

This repo contains the instruction-tuned 32B Qwen2.5-Coder model, which has the following features:

  • Type: Causal Language Models
  • Training Stage: Pretraining & Post-training
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Number of Parameters: 32.5B
  • Number of Paramaters (Non-Embedding): 31.0B
  • Number of Layers: 64
  • Number of Attention Heads (GQA): 40 for Q and 8 for KV
  • Context Length: Full 131,072 tokens
    • Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts.

For more details, please refer to our blog, GitHub, Documentation, Arxiv.

Evaluation & Performance

Detailed evaluation results are reported in this 📑 blog.

For requirements on GPU memory and the respective throughput, see results here.