deepseek-ai/Janus-Pro-7B cover image

deepseek-ai/Janus-Pro-7B

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

Public
$0.002 / img
ProjectPaperLicense

Command line API

You can use our command-line tool deepctl to run inferences:

deepctl infer \
    -m 'deepseek-ai/Janus-Pro-7B'  \
    -i image=@my_image.jpg  \
    -i 'question=Explain this image.'

which will give you back something similar to:

{
  "response": "A photo of an astronaut riding a horse on Mars.",
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

imagestring

Input image bytes for visual question answering task


questionstring

Question about the provided image


seedinteger

Random seed for reproducibility, default is random

Range: 0 ≤ seed < 18446744073709552000


top_pnumber

Top-p sampling parameter, higher values increase diversity

Default value: 0.95

Range: 0 ≤ top_p ≤ 1


temperaturenumber

Temperature parameter, higher values increase randomness

Default value: 0.1

Range: 0 ≤ temperature ≤ 1


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema