🚀 New models by Bria.ai, generate and edit images at scale 🚀
Qwen/
$0.30
in
$1.49
out
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

You can POST to our OpenAI Chat Completions compatible endpoint.
Passing a url to an image is the easiest way to perform OCR.
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "Qwen/Qwen3-VL-235B-A22B-Instruct",
"max_tokens": 4092,
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://url.com/to/shakespeare.png"
}
}
]
}
]
}'
Another options is to read the image from a file
BASE64_IMAGE=$(base64 -w 0 shakespeare.png)
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d @- <<EOF
{
"model": "Qwen/Qwen3-VL-235B-A22B-Instruct",
"max_tokens": 4092,
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,$BASE64_IMAGE"
}
}
]
}
]
}
EOF
© 2025 Deep Infra. All rights reserved.