QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark
QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark
QVQ-72B-Preview
Ask me anything
QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities.
QVQ-72B-Preview | o1-2024-12-17 | gpt-4o-2024-05-13 | Claude3.5 Sonnet-20241022 | Qwen2VL-72B | |
---|---|---|---|---|---|
MMMU(val) | 70.3 | 77.3 | 69.1 | 70.4 | 64.5 |
MathVista(mini) | 71.4 | 71.0 | 63.8 | 65.3 | 70.5 |
MathVision(full) | 35.9 | – | 30.4 | 35.6 | 25.9 |
OlympiadBench | 20.4 | – | 25.9 | – | 11.2 |
QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark, showcasing QVQ's powerful ability in multidisciplinary understanding and reasoning. Furthermore, the significant improvements on MathVision highlight the model's progress in mathematical reasoning tasks. OlympiadBench also demonstrates the model's enhanced ability to tackle challenging problems.
But It's Not All Perfect: Acknowledging the Limitations
While QVQ-72B-Preview exhibits promising performance that surpasses expectations, it’s important to acknowledge several limitations:
Note: Currently, the model only supports single-round dialogues and image outputs. It does not support video inputs.