Qwen/Qwen2.5-Coder-7B cover image

Qwen/Qwen2.5-Coder-7B

Qwen2.5-Coder-7B is a powerful code-specific large language model with 7.61 billion parameters. It's designed for code generation, reasoning, and fixing tasks. The model covers 92 programming languages and has been trained on 5.5 trillion tokens of data, including source code, text-code grounding, and synthetic data.

Qwen2.5-Coder-7B is a powerful code-specific large language model with 7.61 billion parameters. It's designed for code generation, reasoning, and fixing tasks. The model covers 92 programming languages and has been trained on 5.5 trillion tokens of data, including source code, text-code grounding, and synthetic data.

Public
$0.055 / Mtoken
32,768
ProjectPaperLicense

Input

text to generate from

maximum length of the newly generated generated text.If explicitly set to None it will be the model's max context length minus input length. (Default: 512, 1 ≤ max_new_tokens ≤ 1000000)

Temperature

temperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity (Default: 0.7, 0 ≤ temperature ≤ 100)

Sample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens (Default: 0.9, 0 < top_p ≤ 1)

Min P

Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this. (Default: 0, 0 ≤ min_p ≤ 1)

Sample from the best k (number of) tokens. 0 means off (Default: 0, 0 ≤ top_k < 1000)

Repetition Penalty

repetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition. (Default: 1, 0.01 ≤ repetition_penalty ≤ 5)

stop
You can add more items with the button on the right

Num Responses

Number of output sequences to return. Incompatible with streaming (Default: 1, 1 ≤ num_responses ≤ 4)

How to format the response 2

Presence Penalty

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. (Default: 0, -2 ≤ presence_penalty ≤ 2)

Frequency Penalty

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics. (Default: 0, -2 ≤ frequency_penalty ≤ 2)

A unique identifier representing your end-user, which can help monitor and detect abuse. Avoid sending us any identifying information. We recommend hashing user identifiers.. (Default: empty)

Seed for random number generator. If not provided, a random seed is used. Determinism is not guaranteed. (Default: empty, 0 ≤ seed < 9223372036854776000)

You need to login to use this model

Output

I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and

Qwen2.5-Coder-7B

Introduction

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). For Qwen2.5-Coder, we release three base language models and instruction-tuned language models, 1.5, 7 and 32 (coming soon) billion parameters. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

  • Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc.
  • A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
  • Long-context Support up to 128K tokens.

This repo contains the 7B Qwen2.5-Coder model, which has the following features:

  • Type: Causal Language Models
  • Training Stage: Pretraining
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Number of Parameters: 7.61B
  • Number of Paramaters (Non-Embedding): 6.53B
  • Number of Layers: 28
  • Number of Attention Heads (GQA): 28 for Q and 4 for KV
  • Context Length: Full 131,072 tokens
    • Please refer to this section for detailed instructions on how to deploy Qwen2.5 for handling long texts.

We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model.

For more details, please refer to our blog, GitHub, Documentation, Arxiv.

Evaluation & Performance

Detailed evaluation results are reported in this 📑 blog.

For requirements on GPU memory and the respective throughput, see results here.