We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

FastVideo/

LTX2-Distilled-Diffusers

$0.0360

/ second

LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

Public

Project License

api versions

Input

Prompt

text prompt describing the video content

You need to login to use this model

Settings

Negative Prompt

negative text prompt. (Default: empty)

Seed

specify a seed for reproducible output (Default: empty)

Seconds

Video duration in seconds.

Resolution

Video resolution.

Orientation

Video orientation.

Output

Model Information

LTX-2 is the first DiT-based audio-video foundation model that contains all core capabilities of modern video generation in one model: synchronized audio and video, high fidelity, multiple performance modes, production-ready outputs, API access, and open access.

ltx-2.mp4 🚀 Quick Start

Clone the repository

git clone https://github.com/Lightricks/LTX-2.git cd LTX-2

Set up the environment

uv sync --frozen source .venv/bin/activate Required Models Download the following models from the LTX-2.3 HuggingFace repository:

LTX-2.3 Model Checkpoint (choose and download one of the following)

ltx-2.3-22b-dev.safetensors - Download ltx-2.3-22b-distilled-1.1.safetensors - Download Spatial Upscaler - Required for current two-stage pipeline implementations in this repository

ltx-2.3-spatial-upscaler-x2-1.1.safetensors - Download ltx-2.3-spatial-upscaler-x1.5-1.0.safetensors - Download Temporal Upscaler - Supported by the model and will be required for future pipeline implementations

ltx-2.3-temporal-upscaler-x2-1.0.safetensors - Download Distilled LoRA - Required for current two-stage pipeline implementations in this repository (except DistilledPipeline, ICLoraPipeline, and LipDubPipeline)

ltx-2.3-22b-distilled-lora-384-1.1.safetensors - Download Gemma Text Encoder (download all assets from the repository)

Gemma 3 LoRAs

LTX-2.3-22b-IC-LoRA-Union-Control - Download LTX-2.3-22b-IC-LoRA-Motion-Track-Control - Download LTX-2-19b-IC-LoRA-Detailer - Download LTX-2-19b-IC-LoRA-Pose-Control - Download LTX-2-19b-LoRA-Camera-Control-Dolly-In - Download LTX-2-19b-LoRA-Camera-Control-Dolly-Left - Download LTX-2-19b-LoRA-Camera-Control-Dolly-Out - Download LTX-2-19b-LoRA-Camera-Control-Dolly-Right - Download LTX-2-19b-LoRA-Camera-Control-Jib-Down - Download LTX-2-19b-LoRA-Camera-Control-Jib-Up - Download LTX-2-19b-LoRA-Camera-Control-Static - Download LTX-2.3-22b-IC-LoRA-HDR - HDR IC-LoRA and pre-computed text embeddings for HDRICLoraPipeline LTX-2.3-22b-IC-LoRA-LipDub - Download Available Pipelines TI2VidTwoStagesPipeline - Production-quality text/image-to-video with 2x upsampling (recommended) TI2VidTwoStagesHQPipeline - Same two-stage flow as above but uses the res_2s second-order sampler (fewer steps, better quality) TI2VidOneStagePipeline - Single-stage generation for quick prototyping DistilledPipeline - Fastest inference with 8 predefined sigmas ICLoraPipeline - Video-to-video and image-to-video transformations (uses distilled model.) KeyframeInterpolationPipeline - Interpolate between keyframe images A2VidPipelineTwoStage - Audio-to-video generation conditioned on an input audio file RetakePipeline - Regenerate a specific time region of an existing video HDRICLoraPipeline - Video-to-video with HDR output (linear float frames via LogC3 inverse decode, suitable for EXR export and tonemapping) LipDubPipeline - Lip dubbing, rephrasing, matching speaker identity (distilled model, single IC-LoRA, Two stages). ⚡ Optimization Tips Use DistilledPipeline - Fastest inference with only 8 predefined sigmas (8 steps stage 1, 4 steps stage 2) Enable FP8 quantization - Enables lower memory footprint: --quantization fp8-cast (CLI) or quantization=QuantizationPolicy.fp8_cast() (Python). Fp8-cast should be used with bf16 checkpoints, it shall downcast them on the fly. For Hopper GPUs with TensorRT-LLM, use --quantization fp8-scaled-mm for FP8 scaled matrix multiplication. Fp8-scaled-mm should be used with fp8 checkpoints. Install attention optimizations - Use xFormers (uv sync --extra xformers) or Flash Attention 3 for Hopper GPUs Use gradient estimation - Reduce inference steps from 40 to 20-30 while maintaining quality (see pipeline documentation) Skip memory cleanup - If you have sufficient VRAM, disable automatic memory cleanup between stages for faster processing Choose single-stage pipeline - Use TI2VidOneStagePipeline for faster generation when high resolution isn't required ✍️ Prompting for LTX-2 When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure:

Start with main action in a single sentence Add specific details about movements and gestures Describe character/object appearances precisely Include background and environment details Specify camera angles and movements Describe lighting and colors Note any changes or sudden events For additional guidance on writing a prompt please refer to https://ltx.video/blog/how-to-prompt-for-ltx-2

Automatic Prompt Enhancement LTX-2 pipelines support automatic prompt enhancement via an enhance_prompt parameter.

🔌 ComfyUI Integration To use our model with ComfyUI, please follow the instructions at https://github.com/Lightricks/ComfyUI-LTXVideo/.

📦 Packages This repository is organized as a monorepo with three main packages:

ltx-core - Core model implementation, inference stack, and utilities ltx-pipelines - High-level pipeline implementations for text-to-video, image-to-video, and other generation modes ltx-trainer - Training and fine-tuning tools for LoRA, full fine-tuning, and IC-LoRA Each package has its own README and documentation. See the Documentation section below.

📚 Documentation Each package includes comprehensive documentation:

LTX-Core README - Core model implementation, inference stack, and utilities LTX-Pipelines README - High-level pipeline implementations and usage guides LTX-Trainer README - Training and fine-tuning documentation with detailed guides