NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!
nvidia/
$0.010
/ 1M tokens
The llama-nemotron-rerank-vl-1b-v2 is a 1.7B parameter multimodal reranking model designed to evaluate and order the relevance of document images and text against specific user queries. It excels at understanding complex visual content like charts, tables, and infographics.

You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"queries": ["What is the capital of United States of America?"], "documents": ["The capital of USA is Washington DC."]}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/nvidia/llama-nemotron-rerank-vl-1b-v2'
which will give you back something similar to:
{
"scores": [
0.1,
0.2,
0.3
],
"input_tokens": 42,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0,
"output_length": 0
}
}
© 2026 Deep Infra. All rights reserved.