🚀 New models by Bria.ai, generate and edit images at scale 🚀
hexgrad/
$0.62
/ 1M characters
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Input text
Text to convert to speech
Settings
TtsResponseFormat
Select the desired format for the speech output. Supported formats include mp3, opus, flac, wav, and pcm.
Select the desired voice for the speech output. You can select multiple to combine and mix voices.
Speed
Speed of the speech (Default: empty, 0.25 ≤ speed ≤ 4)
Stream
Whether to stream the output
Return Timestamps
Whether to return timestamps
Sample Rate
Sample rate for the output audio. (Default: empty)
Target Min Tokens
Minimum number of tokens for the output. (Default: empty)
Target Max Tokens
Maximum number of tokens for the output. (Default: empty)
Absolute Max Tokens
Absolute maximum number of tokens for the output. (Default: empty)
Waiting for audio data... Submit request to start streaming.
license: apache-2.0 language:
🐈 GitHub: https://github.com/hexgrad/kokoro
🚀 Demo: https://hf.co/spaces/hexgrad/Kokoro-TTS
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
| Model | Published | Training Data | Langs & Voices | SHA256 |
|---|---|---|---|---|
| v1.0 | 2025 Jan 27 | Few hundred hrs | 8 & 54 | 496dba11 |
| v0.19 | 2024 Dec 25 | <100 hrs | 1 & 10 | 3b0c392f |
| Training Costs | v0.19 | v1.0 | Total |
|---|---|---|---|
| in A100 80GB GPU hours | 500 | 500 | 1000 |
| average hourly rate | $0.80/h | $1.20/h | $1/h |
| in USD | $400 | $600 | $1000 |
You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage.
!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
print(i, gs, ps)
display(Audio(data=audio, rate=24000, autoplay=i==0))
sf.write(f'{i}.wav', audio, 24000)
Under the hood, kokoro uses misaki, a G2P library at https://github.com/hexgrad/misaki
Architecture:
Architected by: Li et al @ https://github.com/yl4579/StyleTTS2
Trained by: @rzvzn on Discord
Languages: Multiple
Model SHA256 Hash: 496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4
Data: Kokoro was trained exclusively on permissive/non-copyrighted audio data and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
Total Dataset Size: A few hundred hours of audio
Total Training Cost: About $1000 for 1000 hours of A100 80GB vRAM
The following CC BY audio was part of the dataset used to train Kokoro v1.0.
| Audio Data | Duration Used | License | Added to Training Set After |
|---|---|---|---|
Koniwa tnc | <1h | CC BY 3.0 | v0.19 / 22 Nov 2024 |
| SIWIS | <11h | CC BY 4.0 | v0.19 / 22 Nov 2024 |

© 2025 Deep Infra. All rights reserved.