DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
nvidia/
$0.0108 / second (480p)
Cosmos3 is a world foundation model that unifies understanding and generation within a single Mixture-of-Transformer (MoT) architecture. Two tightly coupled towers—a Reasoner (vision-language model) and a Generator (world simulator)—share latent representations so that structured perception directly grounds realistic, temporally consistent simulation.

You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"prompt": "A cinematic shot of a robot arm picking up a red ball from a wooden shelf in a brightly lit laboratory.", "output_type": "video", "resolution": "256p", "aspect_ratio": "16:9", "duration_seconds": 5}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/nvidia/Cosmos3-Nano'
which will give you back something similar to:
{
"video_url": "/model/inference/pyramid_sample.mp4",
"image_url": null,
"seed": 0,
"action": [
null
],
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0,
"output_length": 0
}
}
© 2026 DeepInfra. All rights reserved.