FLUX.2 is live! High-fidelity image generation made simple.

ResembleAI/
$1.00
/ 1M characters
09/04 🔥 Introducing Chatterbox Multilingual in 23 Languages! We're excited to introduce Chatterbox and Chatterbox Multilingual, Resemble AI's production-grade open source TTS models. Chatterbox Multilingual supports Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, Chinese out of the box. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.

Input text
Text to convert to speech
Voice ID
Voice ID created on deepinfra. (Default: empty)
Settings
ServiceTier
The service tier used for processing the request. When set to 'priority', the request will be processed with higher priority (only applies to models that support it).
TtsResponseFormat
Select the desired format for the speech output. Supported formats include mp3, opus, flac, wav, and pcm.
Language
Language code for multilingual model (e.g., 'en', 'fr', 'zh'). Only used with chatterbox-multilingual.. (Default: empty)
Exaggeration
Exaggeration factor for the speech (Default: 0, 0 ≤ exaggeration ≤ 1)
CFG
CFG factor for the speech (Default: 0, 0 ≤ cfg ≤ 1)
Temperature
Temperature for the speech (Default: 0.8, 0 ≤ temperature ≤ 2)
Seed
Seed for the random number generator (Default: empty, 0 ≤ seed ≤ 2147483647)
Top P
Top P for the speech (Default: 0.95, 0 ≤ top_p ≤ 1)
Min P
Min P for the speech (Default: 0, 0 ≤ min_p ≤ 1)
Repetition Penalty
Repetition Penalty for the speech (Default: 1.2, 0 ≤ repetition_penalty ≤ 5)
Top K
Top K for the speech (Default: 1000, 0 ≤ top_k ≤ 1000)
Waiting for audio data... Submit request to start streaming.
license: mit language:
09/04 🔥 Introducing Chatterbox Multilingual in 23 Languages!
We're excited to introduce Chatterbox and Chatterbox Multilingual, Resemble AI's production-grade open source TTS models. Chatterbox Multilingual supports Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, Chinese out of the box. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out. Try it now on our Hugging Face Gradio app.
If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
General Use (TTS and Voice Agents):
exaggeration=0.5, cfg=0.5) work well for most prompts.cfg to around 0.3 can improve pacing.Expressive or Dramatic Speech:
cfg values (e.g. ~0.3) and increase exaggeration to around 0.7 or higher.exaggeration tends to speed up speech; reducing cfg helps compensate with slower, more deliberate pacing.Note: Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clip’s language.
To mitigate this, set the CFG weight to 0.
pip install chatterbox-tts
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
model = ChatterboxTTS.from_pretrained(device="cuda")
text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."
wav = model.generate(text)
ta.save("test-1.wav", wav, model.sr)
# If you want to synthesize with a different voice, specify the audio prompt
AUDIO_PROMPT_PATH="YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)
import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
multilingual_model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")
french_text = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox, il prend en charge 23 langues."
wav_french = multilingual_model.generate(french_text, language_id="fr")
ta.save("test-french.wav", wav_french, model.sr)
chinese_text = "你好,今天天气真不错,希望你有一个愉快的周末。"
wav_chinese = multilingual_model.generate(chinese_text, language_id="zh")
ta.save("test-chinese.wav", wav_chinese, model.sr)
See example_tts.py for more examples.
Every audio file generated by Chatterbox includes Resemble AI's Perth (Perceptual Threshold) Watermarker - imperceptible neural watermarks that survive MP3 compression, audio editing, and common manipulations while maintaining nearly 100% detection accuracy.
Don't use this model to do bad things. Prompts are sourced from freely available data on the internet.
© 2026 Deep Infra. All rights reserved.