DeepInfra Launches Access to NVIDIA Cosmos 3 World Foundation Models for Physical AI

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Published on 2026.06.04 by Yessen Kanapin

DeepInfra is serving NVIDIA Cosmos 3, NVIDIA's open world foundation model for physical AI, from day zero of its release. As the first omnimodel for physical AI that reasons before it generates, Cosmos 3 is live on DeepInfra today as two variants—Cosmos 3 Nano and Cosmos 3 Super—at the industry's best prices, empowering developers to build physical AI systems without compromising on budget or performance.

What Makes Cosmos 3 Different

Most generative models just generate. Cosmos 3 does something different: it reasons first, then generates. That distinction matters a great deal if you're building physical AI systems like robots or autonomous vehicles, where generating plausible-but-wrong outputs isn't just a quality issue—it's a safety one. As NVIDIA describes it, Cosmos 3 is the first OmniModel that unifies reasoning, world, and action generation in a single architecture.

Under the hood it uses a Mixture-of-Transformer architecture that combines an autoregressive reasoner with a diffusion-based generator. Inputs and outputs span text, image, video, audio, and action, making Cosmos 3 genuinely multimodal in both directions—not just for perception, but for generation and decision-making as well.

What It's Good At

Synthetic Video Data

Ranked #1 open world generation model for synthetic data generation. Use it to generate training data for physical AI at scale, without expensive real-world data collection.

Policy Backbone

Ranked #1 backbone for world action models. A strong foundation for robotics, embodied AI, and AV policy training.

Visual Reasoning

Ranked #1 open model for visual understanding on fixed infrastructure cameras—useful for smart city, warehouse, logistics deployments, infrastructure monitoring, and industrial automation.

Simulated Environments

Designed for closed-loop learning and simulation workflows. Pairs with NVIDIA AV Sim and Isaac Sim for training, testing, and evaluating physical AI systems in simulated environments before deployment.

Two Variants

Cosmos 3 Nano

The lighter variant. A good starting point for experimentation, fine-tuning, and latency-sensitive workloads.

Cosmos 3 Super

The full-capability variant. Tops the PAI Bench and R-Bench leaderboards. Use it where quality and reasoning performance are the priority.

Both are available on DeepInfra today via our standard API—the same setup as any other model, with no special configuration needed to get started.

Getting Started with NVIDIA Cosmos 3 on DeepInfra

Cosmos 3 Nano and Cosmos 3 Super are live on DeepInfra now. If you're building physical AI, robots, or AV systems and want to experiment with world modeling, reasoning, action generation, and synthetic data creation, this is a strong place to start.

Visit our models page to explore competitive rates for Cosmos 3 inference, or check out the DeepInfra docs to learn more about our complete model ecosystem and developer resources.

GLM-4.6 vs DeepSeek-V3.2: Performance, Benchmarks & DeepInfra ResultsThe open-source LLM ecosystem has evolved rapidly, and two models stand out as leaders in capability, efficiency, and practical usability: GLM-4.6, Zhipu AI’s high-capacity reasoning model with a 200k-token context window, and DeepSeek-V3.2, a sparsely activated Mixture-of-Experts architecture engineered for exceptional performance per dollar. Both models are powerful. Both are versatile. Both are widely adopted […]

NVIDIA Nemotron 3 Nano 30B API Benchmarks: Latency & CostAbout NVIDIA Nemotron 3 Nano 30B A3B NVIDIA Nemotron 3 Nano 30B A3B is a large language model trained from scratch by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It is part of the Nemotron 3 family — NVIDIA’s most efficient family of open models, built for agentic AI applications. […]

GLM-5.1 Model Overview: Features, Capabilities & Use CasesGLM-5.1 is Z.AI’s next-generation flagship model for agentic engineering, released on April 7, 2026 under the MIT license. It is a 754-billion parameter Mixture-of-Experts model with 40 billion active parameters per token, a 202,752-token context window, and up to 131K output tokens. The model is the direct successor to GLM-5, designed specifically for long-horizon autonomous […]

View all