DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

DeepInfra is serving NVIDIA Cosmos 3, NVIDIA's open world foundation model for physical AI, from day zero of its release. As the first omnimodel for physical AI that reasons before it generates, Cosmos 3 is live on DeepInfra today as two variants—Cosmos 3 Nano and Cosmos 3 Super—at the industry's best prices, empowering developers to build physical AI systems without compromising on budget or performance.
Most generative models just generate. Cosmos 3 does something different: it reasons first, then generates. That distinction matters a great deal if you're building physical AI systems like robots or autonomous vehicles, where generating plausible-but-wrong outputs isn't just a quality issue—it's a safety one. As NVIDIA describes it, Cosmos 3 is the first OmniModel that unifies reasoning, world, and action generation in a single architecture.
Under the hood it uses a Mixture-of-Transformer architecture that combines an autoregressive reasoner with a diffusion-based generator. Inputs and outputs span text, image, video, audio, and action, making Cosmos 3 genuinely multimodal in both directions—not just for perception, but for generation and decision-making as well.
Ranked #1 open world generation model for synthetic data generation. Use it to generate training data for physical AI at scale, without expensive real-world data collection.
Ranked #1 backbone for world action models. A strong foundation for robotics, embodied AI, and AV policy training.
Ranked #1 open model for visual understanding on fixed infrastructure cameras—useful for smart city, warehouse, logistics deployments, infrastructure monitoring, and industrial automation.
Designed for closed-loop learning and simulation workflows. Pairs with NVIDIA AV Sim and Isaac Sim for training, testing, and evaluating physical AI systems in simulated environments before deployment.
The lighter variant. A good starting point for experimentation, fine-tuning, and latency-sensitive workloads.
The full-capability variant. Tops the PAI Bench and R-Bench leaderboards. Use it where quality and reasoning performance are the priority.
Both are available on DeepInfra today via our standard API—the same setup as any other model, with no special configuration needed to get started.
Cosmos 3 Nano and Cosmos 3 Super are live on DeepInfra now. If you're building physical AI, robots, or AV systems and want to experiment with world modeling, reasoning, action generation, and synthetic data creation, this is a strong place to start.
Visit our models page to explore competitive rates for Cosmos 3 inference, or check out the DeepInfra docs to learn more about our complete model ecosystem and developer resources.
Best API Providers for DeepSeek V4 in 2026<p>DeepSeek V4 is available across a range of hosted API providers, each with different pricing, performance, and deployment trade-offs. The model comes in two variants: V4 Pro, a 1.6 trillion total parameter Mixture-of-Experts model with 49 billion active parameters and a 1M token context window, and V4 Flash, a lighter 284B total parameter variant built […]</p>
Deploy Custom LLMs on DeepInfraDid you just finetune your favorite model and are wondering where to run it?
Well, we have you covered. Simple API and predictable pricing.
Put your model on huggingface
Use a private repo, if you wish, we don't mind. Create a hf access token just
for the repo for better security.
Create c...
Best API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and Scalability<p>Kimi K2.5 is positioned as Moonshot AI’s “do-it-all” model for modern product workflows: native multimodality (text + vision/video), Instant vs. Thinking modes, and support for agentic / multi-agent (“swarm”) execution patterns. In real applications, though, model capability is only half the story. The provider’s inference stack determines the things your users actually feel: time-to-first-token (TTFT), […]</p>
© 2026 DeepInfra. All rights reserved.