Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware.
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware.
Mistral Small 3
Ask me anything
Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!
This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.
Mistral Small can be deployed locally and is exceptionally "knowledge-dense", fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized.
Perfect for:
For enterprises that need specialized capabilities (increased context, particular modalities, domain specific knowledge, etc.), we will be releasing commercial models beyond what Mistral AI contributes to the community.
This release demonstrates our commitment to open source, serving as a strong base model.
Learn more about Mistral Small in our blog post.
Model developper: Mistral AI Team
Category | Gemma-2-27B | Qwen-2.5-32B | Llama-3.3-70B | Gpt4o-mini |
---|---|---|---|---|
Mistral is better | 0.536 | 0.496 | 0.192 | 0.200 |
Mistral is slightly better | 0.196 | 0.184 | 0.164 | 0.204 |
Ties | 0.052 | 0.060 | 0.236 | 0.160 |
Other is slightly better | 0.060 | 0.088 | 0.112 | 0.124 |
Other is better | 0.156 | 0.172 | 0.296 | 0.312 |
Note:
Reasoning & Knowledge
Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |
Math & Coding
Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |
Instruction following
Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
---|---|---|---|---|---|
mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 |
wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 |
arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 |
ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |
Note: