We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

ResembleAI logo

ResembleAI/

chatterbox-turbo

$1.00

/ 1M characters

Chatterbox is a family of three state-of-the-art, open-source text-to-speech models by Resemble AI. We are excited to introduce Chatterbox-Turbo, our most efficient model yet. Built on a streamlined 350M parameter architecture, Turbo delivers high-quality speech with less compute and VRAM than our previous models. We have also distilled the speech-token-to-mel decoder, previously a bottleneck, reducing generation from 10 steps to just one, while retaining high-fidelity audio output. Paralinguistic tags are now native to the Turbo model, allowing you to use [cough], [laugh], [chuckle], and more to add distinct realism. While Turbo was built primarily for low-latency voice agents, it excels at narration and creative workflows. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.

ResembleAI/chatterbox-turbo cover image

Input

Input text

Text to convert to speech

Voice ID

Voice ID created on deepinfra. (Default: empty)

Settings

Exaggeration

Exaggeration factor for the speech (Default: 0, 0 ≤ exaggeration ≤ 1)

CFG

CFG factor for the speech (Default: 0, 0 ≤ cfg ≤ 1)

Temperature

Temperature for the speech (Default: 0.8, 0 ≤ temperature ≤ 2)

Seed

Seed for the random number generator (Default: empty, 0 ≤ seed ≤ 2147483647)

Top P

Top P for the speech (Default: 0.95, 0 ≤ top_p ≤ 1)

Min P

Min P for the speech (Default: 0, 0 ≤ min_p ≤ 1)

Repetition Penalty

Repetition Penalty for the speech (Default: 1.2, 0 ≤ repetition_penalty ≤ 5)

Top K

Top K for the speech (Default: 1000, 0 ≤ top_k ≤ 1000)

Output

Waiting for audio data... Submit request to start streaming.