We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

ResembleAI logo

ResembleAI/

chatterbox-turbo

$1.00

/ 1M characters

Chatterbox is a family of three state-of-the-art, open-source text-to-speech models by Resemble AI. We are excited to introduce Chatterbox-Turbo, our most efficient model yet. Built on a streamlined 350M parameter architecture, Turbo delivers high-quality speech with less compute and VRAM than our previous models. We have also distilled the speech-token-to-mel decoder, previously a bottleneck, reducing generation from 10 steps to just one, while retaining high-fidelity audio output. Paralinguistic tags are now native to the Turbo model, allowing you to use [cough], [laugh], [chuckle], and more to add distinct realism. While Turbo was built primarily for low-latency voice agents, it excels at narration and creative workflows. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.

ResembleAI/chatterbox-turbo cover image

Create Voice HTTP/cURL API

DeepInfra supports custom voices.

Create voice

The following creates a voice using the curl command.

curl -X POST "https://api.deepinfra.com/v1/voices/add" \
  -H "Content-Type: multipart/form-data" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -F "audio=@hello.wav" \
  -F "name=John Doe" \
  -F "description=John Doe's voice"
copy

which will return something similar to

{
  "user_id": "gh:10000000", 
  "voice_id": "abcd1234abcd1234abcd",
  "name": "John Doe",
  "description": "John Doe's voice",
  "created_at": 1723851387,
  "updated_at": 1723851387
}
copy

Input fields

Input Schema

Output Schema