Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"text": "The quick brown fox jumps over the lazy dog"}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/hexgrad/Kokoro-82M'
which will give you back something similar to:
{
"audio": null,
"input_character_length": 0,
"output_format": "",
"words": [
{
"text": "Hello",
"start": 0.0,
"end": 1.0,
"confidence": 0.5
},
{
"text": "World",
"start": 4.0,
"end": 5.0,
"confidence": 0.5
}
],
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
output_format
stringOutput format for the speech
Default value: "wav"
Allowed values: mp3
opus
flac
wav
pcm
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request