Nemotron 3 Nano Omni — the first multimodal model in the Nemotron 3 family, now on DeepInfra!

PrunaAI/
$0.025 / second
*Pruna's talking head video generation model. Provide a portrait image and either a speech script or an audio file, and the model generates a realistic video of the person speaking. Supports multiple voices, languages, and output resolutions.
Image
URL of the input portrait image (first frame). Supports jpg, jpeg, png, webp.
Audio
URL of an audio file to drive the talking head. Takes priority over voice_script when both are provided.. (Default: empty)
You need to login to use this model
LoginSettings
Voice Script
Script for the person to say when no audio is uploaded.. (Default: empty)
Voice
Voice for generated speech. Defaults to 'Zephyr (Female)'.
Voice Language
Output language for generated speech. Defaults to 'English (US)'.
Voice Prompt
Speaking style, tone, pacing or emotion instructions for the generated speech.. (Default: empty)
Video Prompt
Optional prompt for the video.. (Default: empty)
Resolution
Resolution of the generated video. Defaults to 720p.
Seed
Random seed for reproducible generation. (Default: empty)
Disable Safety Filter
Disable safety filter for prompts and input image. Defaults to true.
Disable Prompt Upsampling
When true, skip the prompt upsampler and pass the raw user prompt to the video model.
© 2026 Deep Infra. All rights reserved.