🚀 New models by Bria.ai, generate and edit images at scale 🚀
ResembleAI/
$10.00
/ 1M characters
New model named Chatterbox by Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations. Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.
Input text
Text to convert to speech
Voice ID
Voice ID created on deepinfra. (Default: empty)
You need to login to use this model
LoginSettings
Exaggeration
Exaggeration factor for the speech (Default: 0.25, 0 ≤ exaggeration ≤ 1)
CFG
CFG factor for the speech (Default: 0.5, 0.1 ≤ cfg ≤ 1)
Temperature
Temperature for the speech (Default: 0.7, 0 ≤ temperature ≤ 2)
Seed
Seed for the random number generator (Default: empty, 0 ≤ seed ≤ 2147483647)
Waiting for audio data... Submit request to start streaming.
We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out. Try it now on our Hugging Face Gradio app.
If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
General Use (TTS and Voice Agents):
exaggeration=0.5
, cfg=0.5
) work well for most prompts.cfg
to around 0.3
can improve pacing.Expressive or Dramatic Speech:
cfg
values (e.g. ~0.3
) and increase exaggeration
to around 0.7
or higher.exaggeration
tends to speed up speech; reducing cfg
helps compensate with slower, more deliberate pacing.© 2025 Deep Infra. All rights reserved.