Zyphra/Zonos-v0.1-transformer cover image

Zyphra/Zonos-v0.1-transformer

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

Public
$7.00 per M characters
ProjectLicense

Input

Text to convert to speech

Select the desired voice for the speech output. 5

Voice ID to use for the speech. Either preset_voice or voice_id should be provided. (Default: empty)

Select the desired language for the speech output. 6

Select the desired format for the speech output. Supported formats include mp3, opus, flac, wav, and pcm. 5

Speaker rate of the speech (Default: empty, 5 ≤ speaker_rate ≤ 35)

Seed for the random number generator (Default: empty, 0 ≤ seed ≤ 2147483647)

You need to login to use this model

Output