We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

Bria/video_eraser cover image

Remove unwanted objects or regions from video using a mask, reconstructs the background with intelligent content-aware fill.

video_foreground_mask

Bria/video_foreground_mask cover image

Automatically identify and segment foreground objects across video frames and generate a mask. No prompts, just a video.

video_increase_resolution

Bria/video_increase_resolution cover image

Increase video resolution up to 8K with advanced AI upscaling. Bring your videos to the big screen, ready for the screens of tomorrow.

video_mask_by_key_points

Bria/video_mask_by_key_points cover image

Identify and segment objects across video frames using specific coordinate points. Just point in the right direction and the model will figure out by itself which object should be masked.

video_mask_by_prompt

Bria/video_mask_by_prompt cover image

Identify and segment objects across video frames using a text prompt. The easiest way to create a mask to modify your videos.

video_remove_background

Bria/video_remove_background cover image

Light and fast. Remove the background of your videos to bring the foreground elements to focus. No more unwanted distractions.

Seedance-1.5-Pro

ByteDance/Seedance-1.5-Pro cover image

ByteDance's Seedance 1.5 Pro is a professional video model using V2A native generation for integrated, synced audio-visual output, enhancing efficiency of professional video creation.

$1.200 / 1M tokens

Pixverse/Pixverse-T2V cover image

PixVerse's 720p resolution offers a fast and reliable option for generating standard HD videos, ideal for quick previews and social media content where generation speed is prioritized over maximum detail.

Pixverse-T2V-HD

Pixverse/Pixverse-T2V-HD cover image

The 1080p high-fidelity mode in PixVerse renders videos with significantly enhanced sharpness and visual clarity, capturing intricate details and providing a crisp, professional-grade quality suitable for more polished projects.

PrunaAI/p-video cover image

Real-time AI video generation from text, images, and audio. Supports up to 1080p at 48 FPS with built-in audio generation, draft mode for 4x faster previews, and prompt upsampling.

Wan2.1-T2V-1.3B

Wan-AI/Wan2.1-T2V-1.3B cover image

The Wan2.1 1.3B model is a lightweight, efficient text-to-video generator. Despite its compact size, it delivers impressive performance across benchmarks and generates high-quality 480P videos.

Wan-AI/Wan2.1-T2V-14B cover image

The Wan2.1 14B model is a high-capacity, state-of-the-art video foundation model capable of producing both 480P and 720P videos. It excels at capturing complex prompts and generating visually rich, detailed scenes, making it ideal for high-end creative tasks.

Wan-AI/Wan2.6-I2V cover image

Turn any image into a video. Intelligent shot scheduling supports multi-shot storytelling, generating multi-shot narrative videos with consistent subjects, scenes, and atmosphere

Wan-AI/Wan2.6-T2V cover image

Turn any prompt into a smooth video. Intelligent shot scheduling supports multi-shot storytelling, generating multi-shot narrative videos with consistent subjects, scenes, and atmosphere

google/veo-3.0 cover image

Veo 3 is a state-of-the-art text-to-video model from Google that generates high-fidelity, cinematic videos with synchronized audio from a simple text prompt. It excels at creating realistic and imaginative scenes with a deep understanding of natural language and visual dynamics.

google/veo-3.0-fast cover image

Veo 3 Fast is a speed-optimized version of the Veo 3 model, designed for rapid video creation. While maintaining high quality, it delivers results in a fraction of the time, making it ideal for quick iterations and dynamic content generation.

google/veo-3.1 cover image

Veo 3.1 is the latest text-to-video model from Google that generates high-fidelity, cinematic videos with synchronized audio from a simple text prompt. It excels at creating realistic and imaginative scenes with a deep understanding of natural language and visual dynamics.

google/veo-3.1-fast cover image

Veo 3.1 is the latest text-to-video model from Google that generates high-fidelity, cinematic videos with synchronized audio from a simple text prompt. It excels at creating realistic and imaginative scenes with a deep understanding of natural language and visual dynamics.

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

anthropic/claude-3-7-sonnet-latest deepseek-ai/DeepSeek-V3.2-Exp moonshotai/Kimi-K2-Instruct-0905 zai-org/GLM-4.6 deepseek-ai/DeepSeek-V3.1

Featured Models

ResembleAI/chatterbox-turbo Qwen/Qwen3-TTS nvidia/NVIDIA-Nemotron-3-Super-120B-A12B deepseek-ai/DeepSeek-V3.2 zai-org/GLM-5

Built With Love in Palo Alto

© 2026 Deep Infra. All rights reserved.

Privacy Policy Terms of Service