Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"text": "The quick brown fox jumps over the lazy dog"}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/hexgrad/Kokoro-82M'
which will give you back something similar to:
{
"audio": null,
"input_character_length": 0,
"output_format": "",
"words": [
{
"end": 1.0,
"start": 0.0,
"text": "Hello"
},
{
"end": 5.0,
"start": 4.0,
"text": "World"
}
],
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
output_format
stringOutput format for the speech
Default value: "wav"
Allowed values: mp3
opus
flac
wav
pcm
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.