GLM-5.1 - state-of-the-art agentic engineering, now available on DeepInfra!

Wan-AI/
$0.10 / second
*Accurately preserve the look and voice of people or objects from a reference video, supporting multi-reference co-creation.

Prompt
Text prompt describing the video content. Use 'Image n' / 'Video n' identifiers to reference assets in the media array (in declaration order; images and videos counted separately).
Media
List of reference media. Must contain at least one reference_image or reference_video. The total of reference_image + reference_video must be <= 5. At most one first_frame is allowed.
You need to login to use this model
LoginSettings
Negative Prompt
Negative prompt describing content to exclude. (Default: empty)
Resolution
Resolution tier of the generated video (720P or 1080P). Default 1080P
Ratio
Aspect ratio of the generated video. Default 16:9. Ignored if a first_frame is provided (the first frame's ratio is used).
Duration
Duration of the generated video in seconds. If media contains a reference_video: 2-10. Otherwise: 2-15. Default 5. (Default: empty, 2 ≤ duration ≤ 15)
Prompt Extend
Whether to enable prompt rewriting for better quality. Default true
Watermark
Whether to add AI Generated watermark. Default false
Seed
Random seed for reproducibility (Default: empty, 0 ≤ seed ≤ 2147483647)
Accurately preserve the look and voice of people or objects from a reference video, supporting multi-reference co-creation.
Wan2.6 reference to video flash, faster and more cost-effective generation. Supports using a specified person or any object as a reference, precisely maintaining consistency of appearance and voice, and allows multi‑character reference for joint performances.
© 2026 Deep Infra. All rights reserved.