Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
You can use our command-line tool deepctl to run inferences:
deepctl infer \
-m 'deepseek-ai/Janus-Pro-1B' \
-i image=@my_image.jpg \
-i 'question=Explain this image.'
which will give you back something similar to:
{
"response": "A photo of an astronaut riding a horse on Mars.",
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
top_p
numberTop-p sampling parameter, higher values increase diversity
Default value: 0.95
Range: 0 ≤ top_p ≤ 1
temperature
numberTemperature parameter, higher values increase randomness
Default value: 0.1
Range: 0 ≤ temperature ≤ 1
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request