We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Zyphra/

Zonos-v0.1-hybrid

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

Public
$7.00 per M characters
ProjectLicense
Zyphra/Zonos-v0.1-hybrid cover image

Create Voice HTTP/cURL API

DeepInfra supports custom voices.

Create voice

The following creates a voice using the curl command.

curl -X POST "https://api.deepinfra.com/v1/voices/add" \
  -H "Content-Type: multipart/form-data" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -F "audio=@hello.wav" \
  -F "name=John Doe" \
  -F "description=John Doe's voice"
copy

which will return something similar to

{
  "user_id": "gh:10000000", 
  "voice_id": "abcd1234abcd1234abcd",
  "name": "John Doe",
  "description": "John Doe's voice",
  "created_at": 1723851387,
  "updated_at": 1723851387
}
copy

Input fields

Input Schema

Output Schema

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.