DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
Prompt
text prompt describing the video content
Seconds
Clip duration: always 5 seconds (fixed/required for this model).
Resolution
Output resolution: always 480p (fixed/required for this model).
Orientation
Output orientation: landscape (832x480) or portrait (480x832).
You need to log in to use this model
Log InSettings
Seed
specify a seed for reproducible output (Default: empty)
FastWan-QAD-FP8-1.3B is the backward-compatible variant of the FastWan-QAD series, designed for RTX 4090 and other Ampere/Ada GPUs. It uses FP8 quantized linear layers paired with SageAttention2++, generating a 5-second 480p video in approximately 3.4 seconds — still well ahead of prior distilled methods.
The model is built on Wan-AI/Wan2.1-T2V-1.3B-Diffusers and trained with quantization-aware distillation (QAD) for 3-step inference. For RTX 5090 users, see FastWan-QAD-1.3B for maximum speed with NVFP4.
| Model | Hardware | Generation Time (5s 480p) |
|---|---|---|
| FastWan-QAD-1.3B | RTX 5090 | ~1.78s |
| FastWan-QAD-1.3B-SA2 | RTX 5090 | ~2.0s |
| FastWan-QAD-FP8-1.3B | RTX 4090 | ~3.4s |
| TurboDiffusion | RTX 5090 | 6.10s |
| LightX2V | RTX 5090 | 6.91s |
# Install Tiny Autoencoder
git clone https://github.com/madebyollin/taehv.git
uv pip install -e taehv/
git clone https://github.com/hao-ai-lab/FastVideo.git
cd FastVideo
uv pip install -e .
cd examples/inference/optimizations
python fp8_wan2_1_1_3b.py --taehv-checkpoint /path/to/taehv/taew2_1.pth
More details coming soon.
It would be greatly appreciated if you cite our paper:
@article{Zhang2026AttnQAT,
title={Attn-QAT: 4-Bit Attention With Quantization-Aware Training},
author={Zhang, Peiyuan and Noto, Matthew and Tan, Wenxuan and Jiang, Chengquan and Lin, Will and Zhou, Wei and Zhang, Hao},
journal={arXiv preprint arXiv:2603.00040},
year={2026}
}
© 2026 DeepInfra. All rights reserved.