We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

FastVideo/

FastWan-QAD-FP8-1.3B

$0.0025 / second (480p)

A fast, compact 480p text-to-video model — 5-second clips (landscape or portrait) from a text prompt. A 3-step, FP8 quantization-aware distillation of Wan2.1-T2V-1.3B by FastVideo (Hao AI Lab).

Public

Project Paper License

FastVideo/FastWan-QAD-FP8-1.3B cover image

api versions

Input

Prompt

text prompt describing the video content

Seconds

Clip duration: always 5 seconds (fixed/required for this model).

Resolution

Output resolution: always 480p (fixed/required for this model).

Orientation

Output orientation: landscape (832x480) or portrait (480x832).

You need to log in to use this model

Settings

Seed

specify a seed for reproducible output (Default: empty)

Output

Model Information

FastWan-QAD-FP8-1.3B

Github | Blog | Documentation

Introduction

FastWan-QAD-FP8-1.3B is the backward-compatible variant of the FastWan-QAD series, designed for RTX 4090 and other Ampere/Ada GPUs. It uses FP8 quantized linear layers paired with SageAttention2++, generating a 5-second 480p video in approximately 3.4 seconds — still well ahead of prior distilled methods.

The model is built on Wan-AI/Wan2.1-T2V-1.3B-Diffusers and trained with quantization-aware distillation (QAD) for 3-step inference. For RTX 5090 users, see FastWan-QAD-1.3B for maximum speed with NVFP4.

Model Overview

3-step inference via quantization-aware distillation
FP8 linear layers compatible with Ampere, Ada, and Hopper GPUs
SageAttention2++ backend for attention computation
Trained at 480p (832×480) resolution, 81 frames (5 seconds at 16 fps)
No classifier-free guidance at inference time
Fast decoding via TAEHV tiny autoencoder

Performance

Model	Hardware	Generation Time (5s 480p)
FastWan-QAD-1.3B	RTX 5090	~1.78s
FastWan-QAD-1.3B-SA2	RTX 5090	~2.0s
FastWan-QAD-FP8-1.3B	RTX 4090	~3.4s
TurboDiffusion	RTX 5090	6.10s
LightX2V	RTX 5090	6.91s

Inference

# Install Tiny Autoencoder
git clone https://github.com/madebyollin/taehv.git
uv pip install -e taehv/

git clone https://github.com/hao-ai-lab/FastVideo.git
cd FastVideo
uv pip install -e .
cd examples/inference/optimizations
python fp8_wan2_1_1_3b.py --taehv-checkpoint /path/to/taehv/taew2_1.pth
copy

Training

More details coming soon.

It would be greatly appreciated if you cite our paper:

@article{Zhang2026AttnQAT,
  title={Attn-QAT: 4-Bit Attention With Quantization-Aware Training},
  author={Zhang, Peiyuan and Noto, Matthew and Tan, Wenxuan and Jiang, Chengquan and Lin, Will and Zhou, Wei and Zhang, Hao},
  journal={arXiv preprint arXiv:2603.00040},
  year={2026}
}