FLUX.2 is live! High-fidelity image generation made simple.

ResembleAI/
$1.00
/ 1M characters
Chatterbox is a family of three state-of-the-art, open-source text-to-speech models by Resemble AI. We are excited to introduce Chatterbox-Turbo, our most efficient model yet. Built on a streamlined 350M parameter architecture, Turbo delivers high-quality speech with less compute and VRAM than our previous models. We have also distilled the speech-token-to-mel decoder, previously a bottleneck, reducing generation from 10 steps to just one, while retaining high-fidelity audio output. Paralinguistic tags are now native to the Turbo model, allowing you to use [cough], [laugh], [chuckle], and more to add distinct realism. While Turbo was built primarily for low-latency voice agents, it excels at narration and creative workflows. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link).

Input text
Text to convert to speech
Voice ID
Voice ID created on deepinfra. (Default: empty)
Settings
ServiceTier
The service tier used for processing the request. When set to 'priority', the request will be processed with higher priority (only applies to models that support it).
TtsResponseFormat
Select the desired format for the speech output. Supported formats include mp3, opus, flac, wav, and pcm.
Exaggeration
Exaggeration factor for the speech (Default: 0, 0 ≤ exaggeration ≤ 1)
CFG
CFG factor for the speech (Default: 0, 0 ≤ cfg ≤ 1)
Temperature
Temperature for the speech (Default: 0.8, 0 ≤ temperature ≤ 2)
Seed
Seed for the random number generator (Default: empty, 0 ≤ seed ≤ 2147483647)
Top P
Top P for the speech (Default: 0.95, 0 ≤ top_p ≤ 1)
Min P
Min P for the speech (Default: 0, 0 ≤ min_p ≤ 1)
Repetition Penalty
Repetition Penalty for the speech (Default: 1.2, 0 ≤ repetition_penalty ≤ 5)
Top K
Top K for the speech (Default: 1000, 0 ≤ top_k ≤ 1000)
Waiting for audio data... Submit request to start streaming.
© 2026 Deep Infra. All rights reserved.