Nemotron 3 Nano Omni — the first multimodal model in the Nemotron 3 family, now on DeepInfra!
ByteDance/
$4.300
/ 1M tokens
*A new-generation professional-grade multimodal video creation model developed, supports video generation with multimodal reference inputs including images, videos and audio.

Prompt
text prompt for video generation
You need to login to use this model
LoginSettings
Please upload an image file
Please upload an image file
Resolution
resolution of the output video
Aspect Ratio
aspect ratio of the output video
Duration
duration of the output video in seconds (4-15, or -1 for model to decide) (Default: empty)
Seed
random seed for reproducible output (Default: empty, -1 ≤ seed < 4294967296)
Watermark
whether to add a watermark to the output video
Generate Audio
whether the generated video includes audio synchronized with the visuals
Return Last Frame
whether to return the last frame image of the generated video
Dreamina Seedance 2.0 is a new-generation professional-grade multimodal video creation model developed by the Doubao Large Model Team. It supports generating videos using multimodal reference inputs such as images, videos, and audio, breaking the creative limitations of single-source materials. It also features video editing and video extension capabilities, enabling high-precision reproduction of object details, textures, timbres, visual effects styles, camera movements, and more. Character features can be stably maintained as well, giving creators director-level control over their work.
Dreamina Seedance 2.0 delivers ultra-realistic audio-visual stability: with exceptional motion stability and image detail, it produces footage comparable to real shooting and delivers a visually striking, indistinguishable-from-reality impact. It excels at handling complex scenes, vividly reproducing everything from subtle facial expressions and intense physical interactions to dynamic song and dance performances. It also features built-in professional camera movement, multi-shot narration, and text generation capabilities to enhance narrative tension. With native synchronized audio and video, it generates rich sound effects that precisely match visuals, and supports performances in multiple languages, accents, and dialects, endowing videos with ultimate completeness and immersion.
Dreamina Seedance 2.0 is deeply tailored for three core scenarios: commercial advertising, film and television production, and social media marketing. With industrial-grade generation quality, it greatly improves the success rate of content creation, lowers the barriers and costs of producing high-quality content, optimizes the production workflow from creativity to final video, and brings significant efficiency gains to the industry.
Multimodal reference-based with fine-grained feature preservation
Precise, targeted video editing
Seamless video extension with temporal coherence
© 2026 Deep Infra. All rights reserved.